Wikimedia Engineering/Report/2013/July
Engineering metrics in July:
- 114 unique committers contributed patchsets of code to MediaWiki.
- The total number of unresolved commits went from around 960 to about 1283.
- About 40 shell requests were processed.
- Wikimedia Labs now hosts 168 projects and 1,623 users; to date 2,167 instances have been created.
- The tools project in Labs now hosts 252 tools and 218 members.
Major news in July include:
- Giving more editors an easy-to-use editing interface (the VisualEditor) on several Wikipedias
- Improving language support on our sites via summer interns' projects and easier configuration options, and asking for help translating the VisualEditor interface
- Enabling users to edit our sites from mobile devices, like phones and tablets, and announcing a future user experience bootcamp focusing on mobile editing
- Finishing our transition from keeping source code in Subversion to storing it in Git
- Launching a Wikipedia Zero partnership with Aircel, giving mobile subscribers in India the potential to access Wikipedia at no data cost
- Updating the Wikimedia movement on how we intend to protect our users' privacy with HTTPS
- Signing a contract with longtime MediaWiki contributors to manage MediaWiki releases for the open source community
- Explaining how we find and gather software problems and deliver the fixes to users
Note: We're also providing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
Upcoming events
editThere are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.
For a more complete and up-to-date list, check out the Project:Calendar.
Date | Type | Event | Contact |
---|---|---|---|
1 August 2013 | WMF Metrics and Activities meeting | Erik Möller | |
7 August 2013–8 August 2013 | Wikimania Hackathon (Hong Kong, China) | Wikimania organizers | |
9 August 2013–11 August 2013 | Wikimania Conference (Hong Kong, China) | Wikimania organizers | |
28 August 2013 | QA: Bug triage for RTL languages | Runa Bhattacharjee |
Personnel
editAre you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
- Software Engineer - Fundraising
- Software Engineer - Language Engineering
- Software Engineer - Multimedia Systems
- Senior Software Engineer - Platform
- Product Manager - Platform
- Dev-Ops Engineer - SRE
- Software Engineer - Editor Engagement Experimentation
- Software Engineer - Editor Engagement
- Director of Program - Mobile
- Front-end Developer - Analytics
Announcements
edit- Bryan Davis joined the Platform Engineering team as a Senior Software Engineer, working generally on backend software issues and starting off supporting multimedia (announcement).
- C. Scott Ananian joined the Parsoid team as a Senior Features Engineer (announcement).
- Kenan Wang joined the Product team as Product Manager for Mobile (announcement).
Technical Operations
editSite infrastructure
- Lots of Puppet refactoring work got done this month, including considerable reorganization of the puppet masters. Several manifests have been moved into modules, but completing this project will take many months.
- The English Wikipedia dumps ran out of our Ashburn data center this month, and so did a number of other big wikis' dumps. There's an issue with the abstract dumps that needs to be sorted out for those, but other than that everything ran smoothly.
- Petr Onderka has been getting a lot of work done on the incremental dumps. A first preview of the code was announced as well as a proposed binary file format which the program currently uses. For a preview of what's coming up, you can check the timeline. Your comments and suggestions are welcome!
- Though there were some features introduced this month, the majority of our time was spent on documentation, tracking down bugs and improving usability. We had a documentation sprint this month, targeted at improving documentation for the Tool Labs project. Work continued on stabilizing the NFS server -- we believe we've tracked down the stability issues to RAID controller problems. The compute nodes are becoming increasingly low on disk space, but we've tracked this down in a change in behavior of nova and have deployed a fix. nova-network was starting to experience timeouts due to excessive load, leading to instance creation failures. We've extended dhcp renewal times to reduce load. We upgraded wikitech.wikimedia.org and wiktech-static to the 1.22wmf11 version of MediaWiki. We also: deployed the AJAX-enabled delete instance feature; deployed a change to display more informative instance statuses; fixed issues in LdapAuthentication that broke blocking and renaming users; and deployed a change to allow service groups to be added to service groups, to make sharing code and data between tools easier.
Editor retention: Editing tools
editEditor engagement features
editFlow Portal/Project information
Editor engagement experiments
editFor the GettingStarted, E3 collaborated with Platform engineering to ensure compatibility with the new "SUL2" cross-wiki authentication architecture. For the GuidedTour extension, the team completed a first release of support for guided tours of the VisualEditor interface, alongside tours of the legacy wikitext editor, and developed a plan to refactor the GuidedTour extension as well as its API. E3 also planned for its sixth A/B test of the GettingStarted workflow (see proposed specification and mockups). As an addition to the team's redesign of account creation and login (launched in May-June), we enhanced the design of the form for users who fulfill account creation requests for others.
E3 team member Matthew Flaschen also worked with two Google Summer of Code students on their projects. Richa Jain is working on the Annotator extension, which allows adding inline comments to a wiki page. Rahul Maliakkal is working on the Pronunciation Recording extension, for adding audio of pronunciations to Wiktionary.
On the experimental tools and data analysis front, E3 completed a significant rewrite of the Puppet configuration for EventLogging, our data collection pipeline, among other changes. For the MediaWiki-Vagrant portable desktop development environment, E3 added support for flexibly provisioning and unit testing extensions such as GettingStarted, GuidedTour, ParserFunctions, EventLogging, and others. Last but not least, the micro-survey of gender of new account registrations was enabled on German, French, Italian, and Polish Wikipedias, while data analysis on the English Wikipedia results began.Support
editMediaWiki Core
editSecurity auditing and response
Quality assurance
editQuality Assurance/Browser testing
Analytics
editWe reviewed our planning document with the Sue and Erik and the Engineering Directors. Reception was positive and we will be communicating next steps more widely in August. The Analytics team focused on short term deliverables, reliability and hiring in July. We identified two potential candidates for front-end/Python work. We have been performing multiple phone screens together with Recruiting, and the hiring pipelines are good.
Kraken:
- We kicked off a reliability project with Ops with the end goal of stabilizing Hadoop and the logging infrastructure. Teams have been in discussions on architecture and planning, and should have a path forward in the next 2 weeks. We identified a consultant who will perform a system audit to aid the project.
- We continue adding new metrics and alerts to monitor all the different parts of the webrequest dataflows into Kraken. We expect to keep making improvements in the coming months until we have a fully reliable data pipeline into Kraken.
- We puppetized Hue, Hive, and Oozie. We also have a working setup of the Hadoop cluster in Labs for testing purposes. All Puppet work is open sourced.
Logging Infrastructure:
- We started this month with designing a canary event monitoring system. A canary event is an artificial event that is injected at the start of the data workflow and which we will monitor to see it reaches its final destination; that way we can ensure that the dataflows are functioning.
- We are investigating what data format to use for sending the webrequest messages from Varnish to the Hadoop cluster. Formats that we are scrutinizing are JSON, Protobuf and AVRO, but we are also looking at compressions algorithms such as Snappy.
Analytics/Visualization, Reporting & Applications
Wikimetrics: We successfully launched the initial version of Wikimetrics: see metrics.wmflabs.org. This version has support for cohort upload and two metrics: 1) bytes added and 2) namespace edits. We are working on adding support for time-series and aggregators. In the coming sprints we will focus on adding new metrics.
Wikipedia Zero: Dashboards have been moved off of Hadoop for the time being and are now being populated again. We have identified some issues with logrotation that are causing gaps in the graphs, and will look into these problems. Also, we have been working on technical handoff as Evan Rosen leaves the Foundation.
Limn: No development news.
Wikistats: No development news.- Erik Zachte published data and longitudinal analyses of edit and revert trends for Wikimedia projects (read the announcement). We provided data and ad-hoc analysis for the presentation A State of Decline? The State of Wikimedia Communities as of July 2013 at the July 2013 Monthly Metrics Meeting.
- We published the analysis of a controlled experiment that we ran in June to test the Impact of notifications on new contributors and a pre-release A/B test of Visual Editor on the English Wikipedia. We performed an extensive audit of the quality of the data collected during and after the VE test, taking into account browser limitations and known bugs, and posted an update on the state of the analysis. We released via our open data repository the complete dataset of the sample of new registered users who participated in the split test to ensure the replicability of the analysis.
- We released real-time dashboards on edit activity, new account registrations and reverts for the 10 Wikipedias on which VE has been rolled out. (en • de • es • fr • he • it • nl • pl • ru • sv)
Engineering community team
editVolunteer coordination and outreach
- The language team deployed Universal Language Selector (ULS) to most Wikimedia wikis to provide easier configuration options to readers and contributors. ULS provides a flexible way to configure and deliver language settings like interface language, fonts, and input methods (keyboard mappings). Also, ULS allows users to type text in different languages not directly supported by their keyboard, read content in a script for which fonts are not available locally, or customise the language in which menus are displayed. For more information, please see the FAQ.
- The Language engineering team also mentored summer interns' projects to improve language support on our sites, and asked for volunteer help translating the VisualEditor interface.
The Kiwix project is funded and executed by Wikimedia CH.
- We are preparing the first release of a new Wikipedia ZIM creation solution for August. We also have achieved a new release of Kiwix for Android; this new version includes a few bug fixes and new features. Beside the release of traditional Wikipedia ZIM files, we have also published two interesting ZIM files: one which includes 2,500 ebooks (EPUB & PDF) of French literature and one with the new Wikipedia for Schools selection. The ZIM incremental update GSoC project progresses well too: first working versions of zimpatch & zimdiff console tools are available, and integration with Kiwix has started. Kiwix developers will be available at Wikimania, during the hacking days and at the WikimediaCH booth (chapter village) during Wikimania itself.
The Wikidata project is funded and executed by Wikimedia Deutschland.
- In July, we deployed Wikidata to all Wikivoyage sites in all languages, to manage their language links. We updated the continued Roadmap for Wikidata Development. Coveralls.io support has been added to most of our components. Since the first deployment of Phase1 to Wikipedia, about 240 million interwikilinks (5GB text) have been removed from articles (2012 vs 2013 analysis).
- In other news, the AAAI Feigenbaum Prize for Watson was donated to the Wikimedia Foundation by IBM research to support work, especially on Wikidata.
- Denny Vrandečić explains why Wikidata items are identified with a Q.
Future
edit- The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts. Annual goals for the 2013–2014 fiscal year are being drafted by some teams and have been finalized by others.