Wikimedia Platform Engineering/MediaWiki Core Team/Quarterly review, October 2014/Notes

The following are notes from the Quarterly Review meeting with the Wikimedia Foundation's MediaWiki Core team, October 2, 2014, 3:00PM - 4:30PM PDT.

Meeting overview page: https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team/Quarterly_review,_October_2014

Attending

Present (in the office): Aaron Schulz, Gabriel Wicke, Chad Horohoe, Tomasz Finc, Rachel diCerbo, Toby Negrin, Ori Livneh, Erik Moeller, Damon Sicore, Tilman Bayer (taking minutes), Rob Lanphier , Lila Tretikov, Dan Garry; participating remotely: Greg Grossmeier, Brad Jorsch, Nikolas Everett, Tim Starling, Chris Steipp, Chris McMahon, Arthur Richards, Bryan Davis

Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Presentation slides from the meeting


RobLa:
Welcome
this will be a somehow abbreviated version of the planned quarterly review

slide 2

[slide 2]

This is the team, and what they worked on since July

slide 3

[slide 3]

Intro round, with everyone describing what they have been working on this q:
Ori: perf engineer, worked on HHVM
Tim: HHVM + architecture
Aaron: worked on HHVM until recently
ChrisM: consumer of everything this team does ;)
ChrisS: Security, CentralAuth/SUL Finalization reviews
Dan: PM mainly on mobile apps, but also some platform stuff like SUL finalization
Bryan: SUL finalization, IEG support, Beta Labs, Vagrant
RobLa: Kunal (not here today) did a ton of work on SUL finalization
Brad: SecurePoll, API stuff, and random bugs, code review
Chad: worked on Search for >1y with Nik, and various other things that came up
Nik: Search, with Chad
Arthur: team practices, will do workshop with core team in a few weeks
Greg: release team manager, work closely with platform team

slide 4

[slide 4]

RobLa: will put bulk of energy in library infrastructure
SecurePoll
Search: once that's fully deployed, it will free up Chad a bit

slide 5

[slide 5]

Bigger projects from Q1:

slide 6

[slide 6]

Ori: historically, addressed performance with lots of caching
that masked perf deficiencies [at least for logged-out users]
(shows https://people.wikimedia.org/~ori/metrics/pngs/anons-vs-editors.png ) penalty of 300-500 milliseconds for logging in :(
(shows [1]): it's actually diverging further
This motivated strong emphasis on improving perf of web app
even though 97[?]% of views cached: those 3-5% contain editing activities, i.e. the lifeblood of the site
Dan: basically, anything interactive

slide 7

[slide 7]

Ori: coupled PHP version update with Ubuntu version update, carried out successfully
previously, Ops managed everything but MW, now more shared ownership
Tim got credited in HHVM release notes for one particularly relevant upstream contribution
Ori: (shows PHP5 vs. HHVM Editor perfomance graph https://grafana.wikimedia.org/#/dashboard/db/Edit%20Performance )
Lila: so this is basically before vs. after? yes
this is fantastic
very impressed, Ori deserves huge props for stepping up, getting everyone on board
Ori: still substantial effort to convert remaining servers to HHVM - handled by Ops team
decided to get big chunk of improvements first, with some work later:
JIT compilers suffer from slow startup time, have to analyze code first
HHVM can do that in advance, one can tell it to not constantly check file on disk [for changes]
[still need to do such configuration work]
Lila: what is the timeline for this?
Ori: handled by Ops, e.g. 25% of reader traffic by Nov 3[?]
Tim: I will scale back HHVM work in next quarter
Lila: so it will be complete next q from Ops perspective? yes
reader (IPs)?
why not prioritize this too, what else is in conflict?
Ori: Tim and Aaron are among the most prolific code reviewers and contributors; tying them up in HHVM has negative distance effects throughout the ecosystem
Lila: OK, but in general, it would be great to keep some people working on perf constantly
Erik: we updated recommended MW dev environment (Vagrant) to HHVM framework months ago already
Damon: what about metrics/analytics, how does that change with HHVM?
Ori: it has a lot of stats capabilities
Damon: e.g garbage collection, ...?
Tim: ...has several profiles
Damon: excellent
Chad: and, not losing instrumentation we already have
Toby: ...
Lila: also, right now on different clusters, so can still compare
Gabriel: are we going to distribute HHVM too?
Ori: HHVM landed in Debian Testing a month ago
apt.wikimedia.org is already included as package source in our Vagrant

slide 8

[slide 8]

Dan: SUL finalization
84 million accounts, around 5% of them...
Lila: even with single sign-on, still have different user pages on different projects
Dan: lots of renaming work
1st goal: everyone can have a global account
enable development without such corner cases
2nd goal: make sure accounts stay unified
developed organically, separate table for each wiki

slide 9

[slide 9]

request a rename: huge community engagement task.
want to make this easier for community to handle it by themselves
current rename process can solve issues for...
Lila: how much completed this q?
Dan: all of the engineering work this q, but might still [take longer for the other work]
most of it [engineering work] completed
RobLa: several weeks, had needed to pull Bryan off for IEG project committed to previously
Lila: so the October deadline is for all the work listed here? yes
Damon: (question about monolithic code base)
Erik: this is separate, won't provide OpenID, etc
i.e. this is just one slice of the identity pie - but an important one
also, this is just about public wikis which are world-writable
e.g. WMF wiki is still separate
Dan: ChrisS helped solve lots of CentralAuth issues, Keegan helped with CE, ...
tentative date: 15 April
work will still be going on, but not engineering (Rachel, Keegan, myself)
Tomasz: what's your time split between apps and this?
Dan: tricky, had to neglect some things in apps
Erik: possible proxy owner for SUL?
Rachel: let's discuss...

slide 10

[slide 10]

Nik: Search
old search didn't really work any more
was a big Java application, depending on lots of outdated libraries
move more of it into MediaWiki, where we have more expertise
also incorporated new features, which was made easier by this
but hit roadblock with backend
contributed upstream to Elasticsearch
Damon: how much do people use our search?
Nik: about 1 million hits/hour
but yes, primary way to stumble on our articles is via Google
one important customer: editors who search for typos, want to keep them happy
Dan: also, e.g. Google can not search by namespace (e.g. talk pages only)
Nik: ...
RobLa: (explains) Elasticsearch is the underlying search engine we are using, it's an external open source project
Cirrussearch is our own project, built on that
Lila: also keep in mind that it's getting [important to get search on mobile right]
Dan: on app, so far only prefix search - one typo, and you find nothing
Lila: even without typo, you find nothing
Tomasz: we should never show no search results
Bernd worked with search team
Nik: yes, he reviewed my code, very useful
team = Nik + 0.5 * Chad
and some hours per week from AndrewO and Filippo from Ops
now deployed on all except enwiki (50% of search traffic), dewiki, frwiki, zhwiki
hardware: will have 4x the I/O

slide 11

[slide 11]

RobLa: Bigger projects proposed for next quarter
purpose of library: peel off components from MediaWiki, "componentize" neatly
start at bottom of stack, find pieces of MW that touch everything
Damon: does that mean literal PHP library? or API..
RobLa: this is a bit more basic
Toby: how does this interact with services?
Bryan: a bit of both
if you have 3-4 years, can understand MW ;)
disentangle so that [it's not required to understand the entirety of MW]
it leads to service orientation, by isolating behaviors in code
e.g. once we have all storage for (somethng) behind an API, we could then rewrite in another language (than PHP)
Damon: ...
Tim: yes, taking components
see plan page on mw.org
candidates: logging, ResourceLoader, ...
Chad: also, for testing
Damon: right, APIs make good testing
RobLa: this will take bulk of our effort in coming q
get a good estimate
get good "inertia" on this (so it's moving on its own)
several extensions pull in their own libraries, in their own way
sometimes different libraries for same task
Lila: what's the final output this q? a design document?
four people on this? yes
Bryan: library decomposition is bigger than an epic, it's a theme, say 2 years
epic for this q: initial foundational work
use Composer (package manager for PHP)
figure out what it means for our deployment, release management, Github mirroring
I wrote an RfC
needs wiki work for documentation
how track bugs, promote use of library, ...
deliverables:
complete (removal of) logging into a library
and something else, like ResourceLoader
document how to use Composer
Lila: OK, but it's important to have plan
Erik: have we reached out to possible reusers outside MW Core?
RobLa: e.g. Timo (in Core)
outside: talked to e.g. Wikidata, but nothing concrete yet
Erik: important to socialize horizontally, e.g...., jQuery,
Bryan: not my primary focus, but yes, should have messaging across extension world
also, move stuff into core that belongs there
Lila: this is probably one of the most complex projects...
looks like more conceptual exercise than actual coding
RobLa: we'll focus on logging as concrete example
still some scoping to do
want to have at least 2-3
Gabriel: backend, storage interface (talked about this at architecture meeting)
eventually, should have MW as API consumer
would be good to work on that too
Erik: would be good to integrate Gabriel into a lot of these conversations
Ori: but there's some more basic work to do
lack of knowledge on how to integrate a library (also external ones)
leads to a lot of (redundant effort)
e.g. releases
some unsexy work like that
RobLa: just separating one library like logging (connected to a lot of things) will keep us busy
Lila: I tell all teams to plan conservatively
this strikes me as one of those unknown projects - how deep is the rabbit hole?
Dan: the more I hear, the more i think it's an amazing idea
Lila: as first step, write down what we want achieve, how is it improving things, success criteria
not redesign for the sake of it, but because it solves deep problems
can be open process
not frame as RfC
RobLa: RfC can have different meanings [in the Wikimedia world]
Lila: who drives MW RfCs, where do they reside?
RobLa: us / on mw.org
ChrisM (on chat): would the "Editor Performance" work also extend to e.g. performance in Flow? I ask because we've found some performance issues there recently that might take advantage of Core improvements, but I'm not sure
Erik: still a lot of perf issues both in old wikitext editor and in VE
and mobile in each
need the data
with VE, also need to measure internal perf - eg. after pressing a key
this is in scope (for improving editing perf)
Flow is not in scope
save rates, initial load time, e.g would [actual VE] section editing help?
Ori: worry about clear cross-team commitments
Dan: on app: conversion rate (tap edit -> save)
Ori: that sounds more like purview of analytics
Toby: some overlap, but yes
Tomasz: ..
Damon: What about regression policy on library infrastructure - are we allowed to regress perf?
Ori, Aaron: shouldn't have much of an effect
RobLa: ...
Erik: good opportunity to model that out (if x, should be reverted)
VE (save) times
Damon: I'm very interested in regression policy
Ori: ...
Gabriel: we did some of that [in VE/Parsoid], was surprisingly hard, a lot of noise, fight with very basic stuff
we did discover some issues that way and it was valuable, but...
Erik: realistically this is a Q2+Q3 thing
Nov + 2 weeks in Dec, then continue after holidays
RobLa: yes, and need to have followup on this review anyway
Lila: you main priority should be to finish what you started (with e.g. Ori's additional work on HHVM)
don't overextend, e.g. you have SecurePoll
Erik: don't want to be too fetishist about q boundaries, it's OK to launch in mid-q
Lila: do SecurePoll, finish perf, some smaller projects
put this [libraries] into stretch goals
Aaron: should come up with KPIs, have dashboards
Erik, Lila, RobLa: agree
Erik: continue using editing perf for that
(slide 12, 13: skipped)