Wikimedia Release Engineering Team/Quarterly review, August 2014

Date: September 3rd | Time: 17:00 UTC | Slides | Notes

Topics: Deploy process and pipeline; browser, manual, and unit testing; everything from development to deployment.

Team edit

Who edit

  • Lead: Greg
  • Team: Antoine, Chris, Dan, Mukunda, Rummana, Sam, Zeljko

Big picture edit

Release Engineering is where our code quality efforts can be amplified. When we do things well, we start to see more responsive development with higher quality code. That is our mission.

What we want to accomplish:

  • More appreciation of, response to, and creation of tests in development
  • Better monitoring and reporting out of our development and deployment processes, especially test environments and pre-deployment
  • Reduce time between code being merged and being deployed
  • Provide information about software quality in a way that informs development and release decisions
  • Help WMF Engineering learn and adapt from experience

Team roles edit

  • Phabricator: Mukunda
  • Deployment tooling1: Sam, Mukunda
  • Jenkins/Zuul: Antoine, Zeljko
  • Beta cluster development/maintenance: Antoine
  • Automated browser tests: Chris M, Zeljko
  • Manual browser testing: Rummana, +1
  • Vagrant: Dan

Previous Quarter Review edit

Deployment tooling edit

  •   Done - Process through all (useful) pain points from the Dev/Deploy review session
  • on-going - Integrate HHVM support into our deployment systems
  •   Done - start the scap(py) & trebuchet integration conversation (strech goal)

Beta cluster edit

  •   Done - Support HHVM in Beta Cluster
  • on-going - Swift cluster in beta (strecth goal)

MediaWiki Release edit

  •   Done - Successfully support the release of MediaWiki 1.23
  •   Done - Kickoff/complete second RFP
  • on-going - Investigate and create useful release/deployment metrics visualizations (stretch goal)

Browser tests edit

  • on-going - Use tags to run builds appropriate to released versions (e.g. don't run master build on test2wiki)
  •   Done - Retire Cloudbees Jenkins instance
  •   Done - Integrate WMF Jenkins with new WMF SauceLabs account
  •   Done - Use API to create test data at runtime more widely
    • Used by MobileFrontend
    • Used by VisualEditor
    • Used by smoke tests
  •   Done - Add browsertests to new repos
    • GettingStarted

Hiring edit

Next Quarter edit

Phabricator edit

Mukunda

  • Migration from Bugzilla completed
    • Be an example early adopter of features
  • Migration from Trello/Mingle started
  • Migration from Gerrit completed (pending unforeseen issues)

metrics

Number of team migrated to Phabricator vs number of teams using Trello/Mingle right now

Deployment tooling edit

Sam, Mukunda

  • scap(py) & trebuchet integration
  • increasing bus factor (important due to new hires/team changes)

Jenkins edit

Antoine, Zeljko

  • Jenkins performance improvements
    • performance is suffereing as it controls more and more tasks, causing many false failures.
    • provisioning more slaves is one aspect of this
  • maintenance and new test infrastructure requests (ongoing)

Beta cluster edit

Antoine, Dan, Sam

  • Add new services (-oids)
  • Swift cluster (remove NFS)
  • Beta Cluster monitoring (baseline)
  • Yet Another Cluster

metrics

  • Real data and graphs from monitoring services

Browser tests edit

Chris, Zeljko, Dan

  • Workshops/trainings in lieu of one-to-one pair programming
  • Improved "best practices" and "getting started" documentation
  • Continued pairing with WMF Engineering teams
  • Begin pairing with the Flow team
  • Environment abstraction layer in mediawiki-selenium to allow for less fragile and more advanced step definitions

metrics

tracking state of browser tests before Thursday branch cut
days since last green build, per Jenkins job
note that the biggest factor in false failures today is poor performance from Jenkins (see above)

Vagrant edit

Dan

  • Wrap up pairing with MobileFrontend
  • Better browser test runner, eg "vagrant run-browser-tests"
  • Investigate creating shareable vagrant- or docker-based test environments
  • Optimize memory hungry services running in the vagrant VM (reduce base memory usage)

metrics

qualitative survey of WMF teams on their use of Vagrant
number of/percentage of WMF production deployed extensions available in Vagrant

Hiring edit

Greg, Chris

Questions edit

from notes

Action Items edit

from notes:

  • add value statement with each goal in next quarter's presentation
  • create RFC around deployment systems
  • Rob/Greg/Rummana/Chris team discussion about scripted testing, especially with new QA Tester starting
  • revisit/discuss role of scripted testing vs exploratory testing at next quarterly review
  • Greg to bring up use case for bare metal test cluster