Development process improvement/2014-01-22/Notes
(Redirected from Development and Deployment Process/Review20140122-Notes)
Pain Points
editInter-team Collaboration
edit- Product review doesn't always happen
- Getting security review can take a long time
- WMF product should be consulted on some shellbugs
Task/Bug/Story tracking
edit- Keeping non-Bugzilla tracking systems (Mingle/Trello) synced with Bugzilla is hard
- Sometimes shellbug requests bypass bugzilla
- How do we know that a shellbug request has consensus
Deployment software and configuration (eg: scap, sartoris)
edit- Security patches don't always get reapplied when extensions are redeployed
- Beta cluster can be broken by a production config change
- External software dependencies keep some software from riding the train
Deployment process/cultural norms
edit- People sometimes merge wmfconfig changes without deploying
- Some teams/products don't ride the train
- Error apathy. Lots of known bugs that nobody is fixing ("Meh. That error is always there or ignore it")
- Time between merge and release branch cut can be 1m to 1w.
- "Minor" changes deployed outside windows
- Sometimes people deploy during reserved deploy windows that they don't own
- Need for backwards compatibility with schema changes limits velocity
- Instrumentation is not sufficent for continuous deployment
- Bug fixes don't roll out quickly enough
(automated) Testing
edit- Unit test coverage is inadequate across features and projects.
- Browser/full stack tests are effective, but we rely on them too much
- Our "test pyramid" is upside-down: http://martinfowler.com/bliki/TestPyramid.html .
- Browser/full stack tests are effective, but we rely on them too much
- No facility for pre-merge full stack tests
- Browser tests are slow (and always will be, even at their fastest)
- And Cloudbees is flaky, and lots of other known problems with browser tests. See: https://www.mediawiki.org/wiki/Browser_testing/architecture
- We don't test integration across repos at branch cut time (extensions with core, config with extensions; not an easy task)
- Could run browser tests on branch cut. Integration/API tests would be useful.
- Labs configuration is not like production
- Setting up a complex wiki environment in Labs is often manual/difficult
- Can't easily run automated browser tests against Vagrant. Improvements to this in process now: https://bugzilla.wikimedia.org/show_bug.cgi?id=58939
- Bootstrapping a wiki on Vagrant isn't automated
Other
edit- No official Vagrant maintainer
- Gerrit's workflow is "not like github"
Deploy Train
edit- "Most" things ride the train
- But lots of things go as Lightning deploys
- Is it broken in prod?
- Is it going to break prod?
- And then there is Parsoid...
Wants
edit- Block commit from production unless a related commit is in production (from Core or Extension)
- Has bitten Cirrus on more than one occasion; primarily on the old branch
- Would be nice to automate a "-2 until other change merges" workflow (used by VE)
- Backports suffer from same/sililar problem and it's possibly exacerbated
- Integrate browser tests with Jenkins (CI is working on this; browser tests being slow is a problem)
- Replace (most of) lightning deploys with a task force of rotating deployers that gathers bug fixes and deploys them during a daily window
- Hopefully makes nominating things for fast deployments more egalitarian
- Visual regression testing. We have spiked this using Sikuli but the value seems low for now at least.
Investigate
edit- Many people mentioned Etsy's work. There is detailed information in these blog posts, circa 2011:
- http://codeascraft.com/2011/04/20/divide-and-concur/ Note the division of test types. Note the PHP code base.
- http://codeascraft.com/2011/10/11/did-you-try-it-before-you-committed/ Note that Etsy's 'try' server is modeled on Mozilla's 'try' server
- Mozilla's 'try' server: http://rhelmer.org/blog/buildbot-try-support
- https://wiki.mozilla.org/ReleaseEngineering/TryServer