Wikimedia Engineering/Report/2012/October

Major news in October include:

Note: As of last month, we're proposing a shorter and simpler version of this report for less technically savvy readers.

Personnel edit

Work with us edit

Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.

Announcements edit

  • Željko Filipin joined the Platform engineering team as QA Engineer (announcement).
  • Andre Klapper joined the Platform engineering team as Bug Wrangler (announcement).
  • Michelle Grover joined the Mobile engineering team as a QA contractor (announcement).
  • Luke Welling joined the Features engineering team as Senior Features Engineer (announcement).
  • Brad Jorsch joined the Platform engineering team as Software Engineer, working in the MediaWiki Core group (announcement).
  • Steven Bernardin joined the Operations team as Data Center Technician, working in our Tampa data center.

Technical Operations edit

Site Infrastructure

Mark Bergsma has successfully implemented range seeking feature in Varnish, fixed several video streaming bugs, and finally redeployed Varnish at Eqiad, replacing the upload Squids in Tampa. Mark is now working on replacing upload Squid at Esams. He has provisioned 8 servers, and based on early testing, less may actually be needed. We are currently using 23 older servers for upload Squid at Esams. In addition, Mark deployed 4 new Varnish servers to serve bits from Esams; The existing 2 are being redeployed for other uses. This will provide higher throughput and added redundancy for the coming Fundraising season.
Due to Swift cross-datacenter replication issues, we have moved originals to our nas1 server in Tampa as a stopgap measure, and replicated the contents to nas1001 at Eqiad. Currently, we have only 'originals' in nas1/nas1001. Mark, Aaron Schulz and Faidon Liambotis will work on copying over thumbnails next. Faidon upgraded Swift to 1.74 to address the Swift proxy memory leak issue.
Tim Starling has deployed limited php-redis on the Apache servers, and Redis is now capturing session data (mc1). The new Memcached servers at Tampa are ready, and Asher Feldman has started testing, and working on putting them into production in the coming weeks. He was originally hoping to use Redis to replace memcached/parser cache, but had to stop due to performance/latency issues with their replication method. Meantime, Asher has put into use the captured session data and identified some areas for performance improvement. The developers have been notified and are working on resolving the issues now.

West Coast caching center

We've started building out a new caching center in the San Francisco area, called ULSFO. In October, Leslie Carr and Daniel Zahn racked and stacked networking equipment. Next, we are going to purchase the caching servers once Mark Bergsma confirms the configuration.

Data Dumps

Compressed multistream format files of current articles are now being produced for all dumps; researchers working with content from the larger wikis may find these helpful. We're working with Amazon on hosting the most recent dumps for EC2 users, thanks to Diederik van Liere. We've encountered some performance issues with media bundle generation off-site and are investigating; we're also working on moving that from experimental to production status.

Wikimedia Labs

Home directories are being migrated to glusterFS: pam_mkhomedir has been enabled, the home directory creation script has been disabled, and /home is now a direct autofs mount, rather than having indirect mounts for each user under /home. Nova, Glance, and Keystone have been upgraded to latest Essex stable release. On all instances, Salt has been installed and puppet has been run. Numerous bugs have been fixed in OpenStackManager (project creation bug, removing tiny flavor from interface) and some features have been added as well (add user to bastion when shell group added). A patch was sent into Nova to enforce unique instance names. The Labs team attended the OpenStack summit in San Diego. One takeaway is that we are leading a team to push a DNS service into OpenStack incubation; Andrew Bogott's Nova DNS code will be merged into this project.

Fundraising

Existing and new payments clusters were fully integrated, with common puppet, code propagation, logging. Eqiad payments cluster was successfully tested with several hours of production traffic. SPF was rolled out for the wikimedia.org domain, which improved deliverability of fundraising email. A failing database machine was replaced, and one additional machine deployed for the duration of the fundraiser. A lot of work with Fundraising Tech on measurement and improvement of banner and landing page performance.

Editing tools edit

VisualEditor [edit]

In October, the team worked to finish most of the re-engineering the code design of VisualEditor so that it is more modular and easier to extend. This has involved creating and documenting a number of formal APIs at each point in the architecture, so a developer does not have to understand the entire code base to be able to add new features. The early version of the VisualEditor on mediawiki.org was updated three times (wmf1, wmf2 and wmf3), fixing a number of bugs and replacing the entire browser selection and typing models, and much of how the user interface connects with the rest of the code.

Parsoid [edit]

The Parsoid team focused on testing the JavaScript prototype parser against a corpus of 100,000 randomly-selected articles from the English Wikipedia. A distributed MapReduce-like system, which uses several virtual machines on Wikimedia Labs, constantly converts articles to HTML DOM and back again to wikitext using the latest version of the Parsoid. For a little over 75% of these articles, this results in exactly the same wikitext, as we intend. For another 18% of these articles, there are some differences in the wikitext, but these are so minor that they don't result in any differences in the produced HTML structure when it is re-parsed. In the production version of Parsoid which will attempt to retain original wikitext as far as possible, these minor differences will only show up, if at all, around content that the user edited. Finally, just under 7% of articles still contain errors that change the produced HTML structure. These issues are the focus of the current work in preparation for the December release.

Editor engagement edit

Article feedback [edit]

This month, we developed a few final features for Article Feedback, which is being tested on 10% of the English Wikipedia. Though our lead developer was loaned out to the Wikivoyage project for most of the month, we deployed a couple improvements to reduce the moderation workload for editors, including: better abuse filters to automatically disallow swear words; new ways for moderators to filter the feedback page; and a check list to help users request oversight. These and other features can be tested on this sample article feedback page or on the central feedback page (please report any bugs on Bugzilla). We also started to collect new research data to track how moderators use the feedback page, as well as measure how many readers who post feedback become editors or registered users. Next month, we plan to make a few final improvements to address these findings, as well as complete code re-factoring to improve database performance. Once these tasks are done in coming weeks, we expect to release Article Feedback v5 to 100% of the English Wikipedia by the end of the year, and to other projects in 2013. For more information about this tool, check our project overview.

Page Curation [edit]

This month, we made a few more improvements to our first release version of Page Curation, which we deployed on the English Wikipedia in September 2012, with a very positive response from community editors. New features we developed based on their feedback include: showing the number of list items for selected filters; showing a notice when a page is only a few minutes old; and notifying the last reviewer if a page is 'mark as unreviewed'. We also developed detailed metrics dashboards for tracking the impact of this new tool. To learn more, visit our introduction page, watch this video tour or read this tutorial. If you are an experienced editor, try out the final version on the English Wikipedia, and report any bugs on Bugzilla. We have now ended new feature development for this product and plan to upgrade it again in 2013.

Micro Design Improvements [edit]

Several bugfixes were deployed to existing improvements, and work began on a new "Agora" extension to make it easier to productise and deploy the team's work, with Trevor Parscal and Rob Moen taking the lead. Vibha Bamba and Oliver Keyes have begun work on improving the templates that are displayed when the "edit" window is opened.

Editor engagement experiments [edit]

In October, the E3 team permanently deployed a confirmation message for all editors (Extension:PostEdit) to 16 Wikipedias, including six of the top 10 projects by size, and worked on associated maintenance of the feature. The team also deployed two iterations of tests for a new registration page (read more), including the beginnings of an API for client-side validation of the sign-up form. In support of current and future work, we deployed the beta EventLogging extension, a new architecture to replace the older ClickTracking extension. Last but not least, work started on redesigning the login and new experiments aimed at onboarding new Wikipedians.

MediaWiki infrastructure & Platform support edit

Echo_(Notifications) [edit]

This month, we expanded our design and development activities for the Notifications project (code-named 'Echo'), to prepare for a first experimental deployment in early 2013. Fabrice Florin, Howie Fung and Vibha Bamba identified product goals, key features and scope for that first release (see project slides), and discussed them with team members, including our partners at Wikia. We also created new conceptual models and workflows for different use cases, as well as requirements and wireframes for our first features. Ryan Kaldari and Benny Situ started to develop new types of notifications (e.g. edit reversion, new page review), integrating with Andrew Garrett's code (e.g. talkpage message, mention), with support from Alex Monk. Aaron Schulz built a new abstracted version of the JobQueue system to support multiple queuing systems.

Support edit

2012 Wikimedia fundraiser [edit]

Throughout October, the Fundraising Engineering team has been working on the final engineering push before the kickoff of the 2012 fundraiser in November. During testing, performance regressions were noticed across many wikis and geographies. The team, with support from many other groups, has identified and is attempting to resolve these issues to enhance not only fundraising, but the overall user experience on Wikimedia sites.

Contributors edit

Wiki Loves Monuments mobile application [edit]

The mobile team started planning the decommission of the WLM Android app. We'll be retiring both the app and the back-end infrastructure. During the decommissioning process, our product team will analyze data from the WLM contest and prepare for Commons upload next.

Readers edit

Mobile design/Wikipedia navigation [edit]

Jon Robson, Brion Vibber, Max Semenik, Arthur Richards, and the product team updated the mobile website with a new navigation bar, easy beta opt-in settings, new typography, and several bug fixes. The new navigation includes the Main page, a link to a random article, and settings functionality. The article page includes a new 'Read in another language' section. This is the biggest and most ambitious visual change that the mobile team has ever attempted. The new design will open the gates to new functionality by building on a cohesive new navigational infrastructure.

Wikipedia Zero [edit]

This month, we've launched Wikipedia Zero with dtac Thailand and STC Saudi. We've also enhanced some diagnostics to inform the users of Zero if they are not using a known partner IP address. We continued to talk with potential new partners and made internal improvements to simplify the configuration process.

MobileFrontend/J2ME app [edit]

We're finishing up the final testing of the Wikipedia J2ME application. We moved from testing on a variety of phones to reviewing on the reference hardware devices. Final approval is expected in early November.

Wikipedia over SMS & USSD [edit]

The team focused on diagnostics and debugging prior to an initial launch with a partner. We're seeking to hire an engineer specifically to work in this area.

GeoData Storage & API [edit]

GeoData has been rewritten to support spatial searches via Solr; we are working with the Operations team to prepare for deployment.

Windows 8 app

Brion Vibber released a native Wikipedia application for Windows 8. This app was not part of the regular Wikimedia product roadmap; instead, it was the result of a 1-day iteration that each team member gets for research time.

Offline edit

Kiwix

Release 0.9rc2 is almost finished. The highlight of this release is kiwix-serve for MS/Windows, directly available from the Kiwix UI. A first version of kiwix-plug was installed on 15 devices with the Afripedia project. With Wikimedia France, USB sticks containing the French-language Wikipedia were made available for purchase for the first time in France; they were sold out after only a week.

MediaWiki Core edit

MediaWiki 1.20/Roadmap [edit]

Mark Hershberger has released a release candidate of the 1.20 tarball.

MediaWiki 1.21/Roadmap [edit]

We started the MediaWiki 1.21 series this month. We deployed MediaWiki 1.21wmf1 and 1.21wmf2 to all wikis, and started to deploy 1.21wmf3. 1.21wmf2 contained a number of significant features: ContentHandler code necessary for supporting Wikidata, high-resolution image support, some refactoring of the CologneBlue skin, and a new "Sites" back-end (also used by Wikidata).

Git/Conversion [edit]

Many more extensions are now replicated from Gerrit to the Wikimedia account on GitHub. Gerrit 2.5rc1 and rc2 were released over the course of October, and a final 2.5 is expected soon. One particularly exciting feature in Gerrit 2.5 is the new extensibility framework, which will allow us to replace our gitweb based source browser with GitBlit (the latter being a far more usable code browsing option). Assuming a showstopper bug with LDAP propagation gets fixed, we'll be able to deploy Gerrit 2.5 shortly after its release, and GitBlit shortly after that.

Wikidata deployment [edit]

The deployment of MediaWiki 1.21wmf2 was an important prerequisite to the deployment of Wikidata to the Wikimedia cluster. Chad Horohoe, Sam Reed, Daniel Zahn, and Mark Bergsma have been putting together the pieces to deploy a wiki to wikidata.org, which happened on October 30.

Wikivoyage migration [edit]

The extension review is almost finished, and we've got an internal test setup on Wikimedia Labs that we've used to test imports and extensions. We've finalized the account migration strategy and started implementing it. We've developed a draft agreement with Wikivoyage e.V. about the transfer of the domain names. The Wikivoyage e.V. members are currently voting to approve the migration. It'll be final on Friday, after which point we can transfer wikivoyage.org to Wikimedia. Our current goal is to do the actual migration next week.

Multimedia [edit]

The last of the blocking bugs have been resolved, so we now plan to deploy TimedMediaHandler to English Wikipedia on October 31 at 16:00 UTC. Further deployments will follow, tentatively the week of November 5. At the beginning of October, we had our new Swift distributed file storage cluster in production, with copies of all original media also being made to an NFS server (ms7) that was destined to fill up in the month of October. Our original plan was to shut off copying to ms7 and rely solely on the Swift cluster. Because of hardware issues with the Swift cluster, we decided we couldn't afford to switch off the copying of files to an NFS backup. We migrated the contents of ms7 over to a much larger NFS server (nas1), and configured nas1 to be the new live backup for images. We plan to remain in this configuration for the foreseeable future as we stabilize our distributed file store.

Site performance and architecture [edit]

Tim Starling committed changes to our implementation of libxml to use the PHP memory allocator, rather than using malloc (the C standard for allocating memory directly managed by the operating system). This will allow us to have per-page limits on the complexity of pages in a way that more closely mirrors their impact on our cluster.

Admin tools development [edit]

The team made some progress on writing an interface for Stewards to mass-lock user accounts, but spent most of their time on Wikidata ahead of its launch, and the Wikivoyage migration.

Security auditing and response [edit]

Quality assurance edit

QA and testing [edit]

Newly hired Michelle Glover and Željko Filipin will be testing software and working on browser-level test automation for both mobile and web platforms. Željko has particular expertise in automated testing and will be joining Antoine Musso and Timo Tijhof in the Netherlands for a "Continuous Integration Summit" in conjunction with Wikimedia Nederland Hackathon 2012.

Beta cluster [edit]

The MediaWiki configuration on the beta cluster has still a few remaining live hacks that prevent it from being upgraded smoothly. The final bits have been tracked down and will need a final sprint.

Continuous integration [edit]

The continuous integration server has been upgraded to Precise, which will let us install more recent versions of various testing software. This upgrade also made it possible to deploy Zuul in production.

Browser testing [edit]

QA Engineer Željko Filipin has made great improvements to the existing automated browser tests and has created some new Mobile tests as well. Mobile QA Engineer Michelle Grover is creating an automatable regression test suite for MobileFrontend. These tests are currently running as builds under a hosted instance of Jenkins, with the intention of moving them to the WMF Continuous Integration environment pending upgrades to the machines hosting gerrit and Jenkins.

Analytics edit

Analytics/Kraken [edit]

Analytics/Limn [edit]

Analytics/Infrastructure [edit]

The Analytics team has been working on: configuring and puppetizing CDH4 (Hue, Sqoop, Oozie, Zookeeper, Hive, Pig), configuring and puppetizing Kafka, benchmarking performance, drafting metadata schemas, setting up ganglia monitoring, setting up prototype pixel service endpoint, and running ad-hoc data queries for fundraising.

Engineering community team edit

Bug management [edit]

New Bug Wrangler Andre Klapper had many discussions with different stakeholders to get a better impression of how work is done, how people interact with the bug tracker, what the expectations are and what policies might be needed. He investigated the product/component organization within bugzilla, started triaging incoming and older reports, and did maintenance work (creation and partial cleanup of products and components). bugzilla.wikimedia.org was upgraded to 4.0.8 with the help of Daniel Zahn, and investigations started to determine how urgent an upgrade to 4.2 was with regard to functionality improvements. Plans for the next month include improving documentation on bug management and bug triaging, and describing interactions between the bug wrangler and the different teams.

Summer of Code 2012/management [edit]

The Wikimedia engineering community continues to help the 2012 GSoC students improve their projects towards the goal of release and deployment. Sumana Harihareswara aims to lead a postmortem discussion in November. Rob Lanphier and Sumana attended a GSoC Mentors' Summit in October, and discussed mentor recruitment, community metrics, how to be more effective mentors, student selection strategies, PHP and code review tools, and other related topics. As a follow-up to Summer of Code, the MediaWiki community is discussing whether to participate in Google Code-In.

Technical communications [edit]

This activity was revived as its scope was expanded to include not only on-wiki engineering project documentation, but more generally the improvement of communications between Wikimedia contributors and the technical community (MediaWiki developers, Operations engineers, etc.). Guillaume Paumier prepared and started a wide and open discussion with editors on some local wikis to identify issues and discussion possible solutions. Management is currently reviewing options to determine the direction this activity will follow in future months.

Volunteer coordination and outreach [edit]

Sumana Harihareswara continued to follow up on contacts (such as those gained at October's Grace Hopper Celebration of Women in Computing), recruit new contributors to the Wikimedia tech community, and mentor newer contributors. She granted developer access and Gerrit project ownership requests, and worked on getting more volunteer developers +2 status in MediaWiki core: 8 volunteers now have MediaWiki core maintainership. Sumana also published a retrospective of the 2012 Berlin Hackathon and updated the list of MediaWiki extensions used on Wikimedia sites towards a better understanding of which parts of the codebase are maintained, and by whom. Hiring for a Volunteer Engineering Coordinator to work on volunteer coordination and outreach is almost finished.

The Wikidata project is funded and executed by Wikimedia Deutschland.

The Wikidata team has worked on initial parts of Phase 2 of Wikidata (Infoboxes) and worked together with the WMF to get Wikidata deployed on http://www.wikidata.org. A big step towards this deployment was the merge of the content handler branch into MediaWiki core. This allows MediaWiki to handle other content types besides just wikitext. In addition, the team is looking for help with the initial design of the Main Page of wikidata.org. A draft was also published by the team discussing how the propagation of changes from a repository to the clients should work. Feedback and questions are welcome during the IRC office hours, on the mailing list and on meta.

Future edit

The engineering management team continues to update the Software deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.