Wikimedia Engineering/Report/2012/December
Engineering metrics in December:
- 113 unique committers contributed patchsets of code to MediaWiki.
- The total number of unresolved commits went from about 535 to about 648.
- About 39 shell requests were processed.
- As of December 2012, users can self-register on Wikimedia Labs (and get access to git/Gerrit). It is no longer necessary to request an account for developer access.
- Wikimedia Labs now hosts 148 projects, 847 users; to date 1378 instances have been created.
- Detailed community metrics are also available.
Major news in December include:
- The launch of an alpha, opt-in version of the VisualEditor to the English Wikipedia, a project more complex than it appears;
- A research study on the use of the Article Feedback feature;
- New metrics for the MediaWiki community;
- The start of the Outreach Program for Women;
- Continued work to improve the workflow and interface for translators.
Note: We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
Personnel
editAre you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
- Software Engineer - Visual Editor
- Software Engineer - Editor Engagement
- Software Engineer (Partners)
- Software Engineer (Apps)
- Software Developer General (Mobile)
- Git and Gerrit software development (Contract)
- Release Manager
- Software Engineer - Multimedia
- Software Engineer (Search)
- Product Manager (Mobile)
- Director of User Experience
- Visual Designer
- Operations Engineer
- Operations Engineer/Database Administrator
- Site Reliability Engineer
Announcements
edit- Matthew Flaschen joined the Wikimedia Features engineering team as Features Engineer (announcement).
- Mike Wang joined the Operations team as part time Labs Ops Engineer (consultant) (announcement).
Technical Operations
edit- The Technical Operations team continued to work on completing the outstanding migration tasks, and to ready our Ashburn infrastructure for the big switchover day, i.e., the complete transition from the Tampa datacenter to the one in Ashburn, on the week of January 22, 2013.
- In the past few months, we've transitioned services from the Tampa datacenter to the one in Ashburn, which now serves most of our traffic (about 90%). However, application (MediaWiki), memcached and database systems are all still running exclusively out of Tampa. We have been working to upgrade the technologies and set up those systems at Ashburn, and we plan to perform the switchover of those services from Tampa to Ashburn in the coming weeks. This will provide us some assurance of a hot standby datacenter, should we encounter an irrecoverable and lengthy outage in one of the main datacenters.
Site infrastructure
- Because December is when the annual Wikimedia fundraiser happens, the Operations team usually makes fewer site infrastructure changes to mitigate the risks of causing outages. Some of the lesser-risk work performed include deploying the new Parsoid cluster to support the Visual Editor project, rolling out doc.wikimedia.org (our auto-generated puppet documentation), using a new and unified SSL certificate for *wikipedia.org and *.m.wikipedia.org sites, and setting up a monitoring server and service in Ashburn.
- Asher Feldman migrated one of the main production slave database server (db59) for the English Wikipedia (enwiki) to MariaDB 5.5.28. He has been testing 5.5.27 on the primary research slave, and on the current build on a slave in Ashburn. Taking the times of 100% of all queries over regular sample windows, the average query time across all enwiki slave queries is about 8% faster with MariaDB compared to our production build of MySQL 5.1-fb. Some queries types are 10–15% faster, some are 3% slower, and nothing looks aberrant beyond those bounds. Overall throughput as measured by qps has generally been improved by 2–10%. Asher wouldn't draw any conclusions from this data yet: more testing is needed to filter out noise, but initial results are positive. The main reason for migrating to MariaDB is not performance, but rather by the belief that it's in the Wikimedia Foundation's and the open-source communities' interest to coalesce around the MariaDB Foundation as the best route to ensuring a truly open and well-supported future for MySQL-derived database technology.
- Mark Bergsma and Faidon Liambotis have made tremendous progress in testing and deploying Ceph in Ashburn. We are hopeful it will be robust and scalable.
- Ryan Lane has been writing a new deployment system using git and Saltstack. Parsoid is currently being deployed with this system, and MediaWiki is slated to use it for its next major deployment.
Fundraising
- There were no major changes on the fundraising infrastructure because of the fundraiser itself. We ordered and received bastion hosts that we're in the process of deploying. Monitoring got an overhaul and we're now sending alerts to the fundraising technical staff or the technical operations team depending on what triggered the alert.
- A tool for dump users to set up interwiki links on their local mirrors is available in alpha, as well as documentation of the interwiki cdb file. Also, work with WanSecurity on mirroring is moving forward: they now hold a current copy of all 'other' files, including page views and Picture of the Year bundles, among other things.
- Labs came out of beta this month, following the opening of self-registration. Another major change this month was the migration from the shared NFS instance to per-project glusterfs volumes. A number of smaller changes were made, including: the addition of puppet documentation links from classes and variables on the instance configuration pages; the modification of the project filter to act as a table of contents; a split of LDAP project groups into projects and POSIX groups; and the installation of Saltstack on all instances to act as a guest agent.
Editor retention: Editing tools
editAs witnessed by the clean edit diffs, Parsoid passed this test with flying colors. This represents very hard work by the team (Gabriel Wicke, Subramanya Sastry and Mark Holmquist) on automated round-trip testing and the completion of a selective serialization strategy just in time for the release.
After catching their breath, the team now has its sights on the next phase in Parsoid development. This includes a longer-term strategy for the integration of Parsoid and HTML DOM into MediaWiki, performance improvements and better support for complex features of wikitext.Editor engagement features
editEditor engagement experiments
editTo go along with the launch of GettingStarted and other experimentation, EventLogging underwent heavy development, including the launch of a new Schema namespace on Meta for defining the data collected in a public, collaborative manner. We created production schemas for GettingStarted, account creation, mobile, and more. Ori Livneh also reworked the format, transmission, and cleanliness of data delivered to analysts and product managers, automatically generating database tables from these schemas for incoming events.
Late in the month, the team collaborated with fundraising to reach out to donors and readers as part of the annual fundraising campaign via email and a "Thank You" banner which ran at the end of the year. In addition to introducing millions of donors and readers to the Wikipedia editor community and inviting them to join, this campaign helped the team establish an experimental baseline for what a campaign to convert readers might look like.
In addition to the above launches, we continued development of the new account creation experience and Guided Tours by Matt Flaschen, which will be launched in January 2013. Active development was also begun by Ryan Faulkner and Dario Taraborelli on a user metrics API. The effort is threefold: to standardize user metrics in data analysis, to build infrastructure to efficiently compute metrics for a large set of users, and finally to expose those results via an API.Support
edit- Other news
- Pau Giner and Amir Aharoni participated in the Open Tech Chat this month to talk about best practices in multilingual user testing and internationalization. Amir Aharoni also participated in mentoring OPW candidate Priyanka Nag for the new LevelUp program. Srikanth Lakshmanan and Arun Ganesh’s tenure ended with the Language Engineering team in December.
- The Mobile development and design team worked to finalize contributory and other experimental editor-focused features on the Beta site (uploads, editing, and watchlist functionality) in order to clear the way for a full push on mobile uploads by March 2013. We also worked to improve the reader and potential editor experience by introducing features geared toward educating/engaging our users, such as a human-readable last modified timestamp for articles and watchlist, and thumbnail images to illustrate the watchlist view. Lastly, because of the huge interest we generated in our Beta testing site, we created an Alpha site to house very early work on contributory features, in order not to disrupt the reading experience of our 100,000+ Beta users.
MediaWiki Core
editSite performance and architecture
Security auditing and response
Quality assurance
editAnalytics
editEngineering community team
editUnrelatedly, Guillaume made a list of 2012 tech blog posts to map tech blog activity by month & subdepartment (with priority activities listed separately). Work on setting up a Volunteer product manager program is also underway.
Quim Gil sorted out Social media channels, and we now have @MediaWiki handles for identi.ca, Twitter, Facebook and Google+. He published the community metrics November report and a blog post introducing this new activity.Volunteer coordination and outreach
The Kiwix project is funded and executed by Wikimedia CH.
- A new Kiwix 0.9rc2 was released. This version embeds our ZIM HTTP server kiwix-serve for Windows, OSX and Linux. It is now integrated in the Kiwix UI, allowing everyone to share Wikipedia on a LAN in two clicks . We have revamped our audience measurement tool, a solution that could be interesting for other projects using Mirrorbrain. We continue at the same time to increase our ZIM production throughput with 8 new Wikipedia ZIM files in December. December was also a month of new records for Kiwix: for the first time, we have had more than 70.000 downloads a month and a Lead position for Education software at Sourceforge.
The Wikidata project is funded and executed by Wikimedia Deutschland.
- New code and bugfixes have been deployed (with MediaWiki 1.21wmf5 and 1.21wmf6) and test2 now gets language links from Wikidata. Changes on Wikidata that concern articles on test2 are shown in the recent changes of test2 as well. If there are no problems, deployment on the Hungarian Wikipedia will happen on January 14, 2013. Other Wikipedia sites will follow.
- For the second phase of Wikidata, representation of values is the central focus. We published a draft and discussions have started; we'd appreciate your feedback. Additionally, Denny Vrandečić and Lydia Pintscher held IRC office hours; logs are available in English and German.
Future
edit- The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.