Wikimedia Platform Engineering/MediaWiki Core Team/Check-ins/20131112

who: Brad, CSteipp, Nik, Antoine, Ori, rob, Tim, Greg

RFC process

DevOps sprint

Graphite Puppet module merged; provisioned on tungsten (professor replacement). Distributes load over four carbon-cache instances so should resolve the issue observed on professor (single instance bound to one core by Python GIL). Still need to copy whisper files and set up udpprofiler or replacement.
rsync running with --compress (=faster, we hope; CDB compresses well) and --delayed-update (=more atomic)
Updating the branch & config repositories on tin should !log (to SAL) the post-update commit SHA1; Sam noticed a bug though.
Ryan working on trebuchet patches, in-progress
- including the per-apache-generated i10n cache

Performance work

Mostly Graphite (see DevOps item #1)
Designed experiment w/Aaron Halfaker for evaluating impact of module storage on page load time. (Assign 0.1% of visitors to experiment, divided equally into control / experiment groups. Will log page load timing from both groups, but module storage will only be enabled for experiment group.)
Schema: https://meta.wikimedia.org/wiki/Schema:ModuleStorage Change: gerrit:94840
More ULS perf troubleshooting (bugzilla:56856)

Search

Having trouble with runJobs.php run from the web process
Must switch how we calculate page weight (SQL too slow)
Deploying (now) to nlwiki
Noise in the logs lately. Specific fixes merged. General fixes scheduled.
bugzilla:56968 We think we’re doing two parses on updates in the web process
Product working on an overall direction for search. Design coming up with mockups for how it could work.

Zuul upgrade

PDF generation replacement

Bug escalation

Errors & deployments (Tabling this as discussion item if there’s time.)

Proposal: block deployments when errors or fatals appear in prod until those errors go away
Be deliberately naive, taking severity at face value