Development process improvement/2014-01-22/Notes
Inter-team Collaboration
edit
- Product review doesn't always happen
- Getting security review can take a long time
- WMF product should be consulted on some shellbugs
Task/Bug/Story tracking
edit
- Keeping non-Bugzilla tracking systems (Mingle/Trello) synced with Bugzilla is hard
- Sometimes shellbug requests bypass bugzilla
- How do we know that a shellbug request has consensus
Deployment software and configuration (eg: scap, sartoris)
edit
- Security patches don't always get reapplied when extensions are redeployed
- Beta cluster can be broken by a production config change
- External software dependencies keep some software from riding the train
Deployment process/cultural norms
edit
- People sometimes merge wmfconfig changes without deploying
- Some teams/products don't ride the train
- Error apathy. Lots of known bugs that nobody is fixing ("Meh. That error is always there or ignore it")
- Time between merge and release branch cut can be 1m to 1w.
- "Minor" changes deployed outside windows
- Sometimes people deploy during reserved deploy windows that they don't own
- Need for backwards compatibility with schema changes limits velocity
- Instrumentation is not sufficent for continuous deployment
- Bug fixes don't roll out quickly enough
- Unit test coverage is inadequate across features and projects.
- Browser/full stack tests are effective, but we rely on them too much
- No facility for pre-merge full stack tests
- Browser tests are slow (and always will be, even at their fastest)
- We don't test integration across repos at branch cut time (extensions with core, config with extensions; not an easy task)
- Could run browser tests on branch cut. Integration/API tests would be useful.
- Labs configuration is not like production
- Setting up a complex wiki environment in Labs is often manual/difficult
- Can't easily run automated browser tests against Vagrant. Improvements to this in process now: https://bugzilla.wikimedia.org/show_bug.cgi?id=58939
- Bootstrapping a wiki on Vagrant isn't automated
- No official Vagrant maintainer
- Gerrit's workflow is "not like github"
- "Most" things ride the train
- But lots of things go as Lightning deploys
- Is it broken in prod?
- Is it going to break prod?
- And then there is Parsoid...
- Block commit from production unless a related commit is in production (from Core or Extension)
- Has bitten Cirrus on more than one occasion; primarily on the old branch
- Would be nice to automate a "-2 until other change merges" workflow (used by VE)
- Backports suffer from same/sililar problem and it's possibly exacerbated
- Integrate browser tests with Jenkins (CI is working on this; browser tests being slow is a problem)
- Replace (most of) lightning deploys with a task force of rotating deployers that gathers bug fixes and deploys them during a daily window
- Hopefully makes nominating things for fast deployments more egalitarian
- Visual regression testing. We have spiked this using Sikuli but the value seems low for now at least.