Wikimedia Release Engineering Team/Checkin archive/20151207





How to do it:

  • Dec 4th: Greg - disconnected, leaving Thursday evening, returning Sunday :)
  • Dec 14-Jan 1: Greg - vacation (3 weeks, will be checking email)
  • Dec 22-29: Chad - Christmas (will be reachable by e-mail, will have laptop in case of emergencies)
  • Dec 23–25: Tyler — Hopeful, probable, Christmas in Kansas!
  • Dec 24-Jan 3: Dan - Holidays
  • Dec 24-30: Antoine - Holidays (bringing laptop - ring phone as needed)
  • Dec 24: mukunda - holiday
  • Dec 25: US HOLIDAY - Christmas Day - December 25
  • Dec 28: mukunda - holiday
  • Dec 31: mukunda - holiday
  • Jan 1: US HOLIDAY - New Year's Day
  • Jan 4 - 8: WikiDev16 + All Hands
  • Jan 16-18: Chad - another music festival
  • Jan 18: US HOLIDAY - Martin Luther King Day
  • Feb 15: US HOLIDAY - President's Day
  • May 17-(?): Dan - paternity leave :D
    • PO Box for pastries? - Antoine
  • May 30: US HOLIDAY - Memorial Day
  • June-ish: Chad - EDC
  • August: France holiday - because french. :)

Team Business


Actions from last meeting

  • TODO - Antoine + Mukunda: should sit down and talk CI/Harbormaster/Nodepool
    • Mukunda just needs to find time to finally test out harbormaster triggering jenkins jobs.
      • this works fine, still need to get jenkins to report back to harbormaster
  • TODO - Greg: look into reasons for spikes of info-level log spam
    • RunJobs logging changes? See engineering-l/ops-l?
    • yes....
  • TODO - No One Yet: investigate carbon aggregation of stats >1 month old behavior
    • ACTION: Antoine to create a task





Q3 Goals

  • Goals timeline:
    • December 3: Group goal scoped and drafted on for Technology team.
    • December 10: Group goal + all individual team *drafts completed* on; discuss at Infra+Tech group and identify dependencies.
    • December 17: individual team goals + group goal *finalized* on; discuss at Monthly Eng Staff.



New vs Maint time spent



  • Rotating Deploys
    • Tyler propose to roll the responsability of cutting branch / deployment train.
    • Tyler cut and deployment Tuesday 12/08 ( group0: 1.27.0-wmf.7->1.27.0-wmf.8 )

Scrum of Scrums

Blocked on us:

Project Updates


CI Scaling

Quarterly Goal: "CI cluster responds to spike in queued builds by starting and registering additional jenkins slaves" -
  • Jobs cleanup
  • Nova scheduler went wild on Sunday ~ midnight UTC. Nodepool could no more spawn instances. Fixed by ops (restart some openstack process)
  • Some Nodepool and Zuul tiny upgrade coming in (speedup related)
  • Jenkins security upgrade

Beta Cluster

  • Team: might have to watch slow query logs / strict errors etc
  • labs lost DNS aliasing on Monday for a few hours
  • EventBus <-- new extension
  • DB outage followup
    • beta cluster db potential move to dedicated hardware?
    • setup a beta cluster specific tendril?
    • ACTION: Antoine to check in with Jaime on what to do next.
    • goal: monitor slow queries before they land prod

Deployment Cabal

Quarterly Goal: "Migrate all Service team owned services and MW deploys to scap3" -
  • (Tyler) Deployed AQS on beta
  • Scap worked flawlessly
  • Mathoid to follow, straightfoward
    • the simpler 'oids have a shared puppet module
  • TODO: Jenkins job to wrap around scap3 deploy on beta

Diff[usion|erential] migration

Quarterly Goal:
  • Redirect of Gerrit project name -> Phabricator canonical URL with callsign
  • Couple patches by Paladox pending. Highlights some issues with Phabricator

Developer Tooling (MW-Vagrant, MW-Selenium, etc.)


Other Work

  • Gerrit 2.12 -
    • Antoine: current web UI is gone. Not sure whether it is worth upgrading.
      • It is. The UI isn't that different and we get stability/security fixes. We're not all going to be using Phabricator in less time than it'll take to upgrade.
        • I am all for keeping up with upstream. I am worried about the community drama related to the new UI :-((( I (antoine) dont care about the UI really :D
          • #nodrama. Seriously, people will live with it if they don't like it. It's freaking gerrit, their UI has always sucked.
            • Make sense :-}
              • Security is going to be my go-to answer if anyone complains. 2.8.x is long past EOL and we're probably vuln from more than one issue.
                • ^ that, always the safe answer