Wikimedia Platform Engineering/MediaWiki Core Team/Quarterly review, January 2014/Notes
Team: https://www.mediawiki.org/wiki/Wikimedia_MediaWiki_Core_Team/Quarterly_review,_January_2014#Team
Ongoing
edit- Deployments
- MW Operations
- Code Review
- Security
- Test infrastructure
- Git/Gerrit
- Less this quarter
- Shell bugs
Previous Quarter
editCirrusSearch
edit- Chad/Nik/Dan/Andrew(s)
- Much more real of a project/deployment.
- Took about a month off to fix up the obvious failures.
- Deployed to ~70% of pages, or 85% of all updates
- mostly kept up in real-time
- Serving 8% of all search traffic
- pretty much all wikis will have CirrusSearch as an opt-in BetaFeature within the next 4 weeks
Deployment Tooling
edit- multi-site awareness & git-deploy feel down in priority
- scap improvements
- specifically speed of deploy (re localization updates/generation)
- hovering around 10 minutes per scap
- as opposed to ~30 min today
- Logstash in production with basic logging info (via udplog)
- much more easy to monitor things without having to grep log files and to see trends more easily
- access right now: the wmf LDAP group. Will probably have to stay that way for foreseeable future - there's PII in there
- will probably be restricted still in the future due to IP addresses and other private information contained in the production logs
Scholarship App
edit- Made the scholarhip application process application (eg: the form submission and review of applications) much much better :)
- ~180 applications submitted as of the morning of the 21st, no indication of users unable to apply
- Review process has not started in earnest yet
Auth Systems
edit- OAuth was deployed (and is being used)!
- refining password expiration protocol and password hashing
- prompted by the potential data breach in October
- SULv2 performance improvements
- cut down the affect of anonymous users on cluster resources (eg: via hitting the backend apaches)
Security Auditing and Response
edit- Code review of a bunch of projects
- GLAM
- Flow
- Scholarship App
- (delayed) Limn/Kraken
- (delayed) TimedMediaHandler v2
- Security Releases (1.21.3 and 1.22.1)
- first one with the outside contractors (M&M)
Performance Monitoring
edit- lots of stuff, see slides ;)
Architecture Formalization
edit- https://www.mediawiki.org/wiki/Requests_for_comment
- RFC IRC meetings
- RFC process seems more clear/transparent
- How does the room feel?
- (silence == bang up job)
- This feeds directly into the MW Core Team's planning
PDF support
edit- Brad on loan
Next Quarter
editSearch
edit- "neat, not cool"
- ENWIKI being indexed
- goal of being done (rolling out) by end of March
- Working on an interwiki search UI (with Design/Brandon)
- Waiting on Rack D (in eqiad) buildout for more search machines
- Rack D -- ops is roughly saying "end of february-ish"
- Beta Features is a good feedback channel - may be increasing # of testers, and makes it easy for the people who want to test it anyway to provide feedback.
- 183 beta users on Commons
HHVM
edit- Goal of production service running on HHVM by end of quarter
- job queue?, l10n updates?, image scalers?
- Need to port LuaSandbox
- packages/puppet/automated testing
- Great working relationship with the upstream team at Facebook
- (discussion of unit tests (some not passing because of intl disabling)
- fastgi
- MediaWiki implemention ... works without ....
- goal: get it running correctly without throwing errors, not optimizing specific services.
- find a self-contained service to convert. e.g., jobqueue or l10n updates
- factor of 5 times faster with HHVM vs standard PHP? Anecdotally it's faster but we need real benchmarks.
- The last time we did benchmarks was factor of 5 with HPHP
- Is HPHP faster or slower than HHVM? Very recently FB blogged that HHVM is faster?
- persistent connections to eg Redis is doable right now and do the same thing with HHVM
- What level of involvement and where will Ops be involved with this?
- pacakges and puppet
- Blockers to HHVM
- packages are crappy (redone by Faidon?)
- monitoring is different for HHVM
- Ops hasn't put in the time to really know what will all need to chage
- People aren't really using the mailing list - it's in the active GitHub project & Freenode channel
Deployment Tooling
edit- Scaling back a bit
- getting some preliminary work in place
- Bryan Davis will be doing a fresh-eyes review of the current system
- which will inform future work, eg: making extension deployment process less brittle
- BTW: Completing the search deployment will remove lsearchd which is a blocker on scap renovation
- Logstash: working with Ops to add more log sources (including Ops specific)
- Ops Request: need a deployment system that is usable beyond MW itself
- Interaction between packaging and deployment (using packages to deploy?)
- Ops offering time to work on Graphite
- To discuss a bunch of this tomorrow in Deployment process meeting
Performance
edit- front end has been neglected, eg 2 separate requests for geoip which was not easily caught without this type of review (and similar code base)
- making performance/latency visible so that teams/developers can see impact
- Ori thinks we can probably get our pageload time down another 300ms or so
- Performance Test Environment?
- "it's on the roadmap"
- blocked on unittests not actually making web requests
- performance monitoring will follow the virtualized test environment
- but all test infrastructure currently in place is virtualized and thus not reliable for data comparisons
- Labs does not have reliable performance characteristics
- Timely (eg: weekly or so) performance reports mailed to Ops and Engineering lists
Security
edit- Password storage update to finally replace our password storage algorithm
- most patches are mostly ready or merged, we're just waiting/reviewing to make sure we do it right the first time
- continuous reviews and training
- Training focus on team leads/project leads
- form tbd
- beginning with WMF individuals
- can potentially leverage those materials, experiences to help ECT goal in April-June to train volunteers in security https://www.mediawiki.org/wiki/Wikimedia_Engineering/2013-14_Goals#Engineering_Community
- Staffing:
- There's an Ops Security opening
- FrontEnd security engineer position, waiting for the internal candidate to become free
Other
edit- PDF Rendering (Brad)
- Product management scoping (Dan)
- admin tools dev
- Chris is engineering point person (to delegate)
- securepoll cleanup (Brad maybe if he has time)
- next election is in about 2 years, there are some hints of early ones (maybe hrwiki)
- SUL finalization
- main engineering point person is Chris
- central CSS discussion
- admin tools dev
+2 maintainership
- any problems?
- punt to a later/larger conversation
TODOS
edit- Sumana & Ken: follow up on possibility of "has signed an NDA" LDAP group
- Bryan & Chris: look into 2FA or similar for Logstash Authentication for users
- Chad & Nik: Get Brandon a link to a JSON API
- More benchmarks for HHVM & MediaWiki - characterise & pinpoint & quantify benefits of HHVM so we have a real value proposition for rest of org
- Mark B to look into this: To collect frontend performance data, would be great to have a varnish kafka topic running on bits varnishes actiing as aggregation point, asks Ori. Not urgent
- Separate eventlogger load-balancing IP? suggests Faidon
- Look into provisioning baremetal performance testing infra?
- maybe just an additional job runner for testing HHVM
- Faidon & Gabriel: Look into provisioning hardware for the large users of Labs, eg Parsoid
- Describe what to do in the event of a users/admin settings leak
- script it?
- Chad: figure out why we still have AdminSettings lingering around. I killed that years ago.
- Chris S & Sumana: talk about upcoming training, brainstorm approaches