Wikimedia Engineering/Report/2013/February
Engineering metrics in February:
- 110 unique committers contributed patchsets of code to MediaWiki.
- The total number of unresolved commits went from about 650 to about 830.
- About 69 shell requests were processed.
- Wikimedia Labs now hosts 150 projects and 1,002 users; to date 1561 instances have been created.
Major news in February include:
- The Wikipedia Zero project got a Knight News Challenge grant.
- Additional input methods were made available for jQuery.IME.
- The Translate extension introduced a new iteration of the Translation Editor.
- The Wikimedia mobile web team launched the ability to view or add pages to watchlist — all from mobile devices.
- Echo is A new notification system for Wikipedia.
- The Technical Operations team found ways to stop problems in their tracks.
- Wikipedia Mobile hit 3 billion monthly page views.
Note: We're also proposing a shorter, simpler and translatable version of this report that does not assume specialized technical knowledge.
Upcoming events
editThere are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.
For a more complete and up-to-date list, check out the Project:Calendar.
Date | Type | Event | Contact |
---|---|---|---|
7 March 2013 | QA: General MediaWiki reports Bug Triage | AKlapper, Valeriej | |
13 March 2013–15 March 2013 | QA: Browser automation testing for Wikipedia Search | Zeljko.filipin, Qgil, Cmcmahon | |
14 March 2013 | Lua meets Wikipedia (San Francisco, CA, USA) | Qgil | |
18 March 2013–22 March 2013 | QA: LiquidThreads (LQT) Bug Triage | AKlapper, Valeriej | |
19 March 2013 | Office hour about Wikimedia's issue tracker and Bug management in #wikimedia-office connect | AKlapper | |
20 March 2013–22 March 2013 | SMWCon Spring 2013 (New York City, USA) | ||
22 March 2013–24 March 2013 | LibrePlanet (Cambridge, MA, USA) | ||
31 March 2013–7 April 2013 | Offline Hackathon 2013 (Paris, France) |
Personnel
editAre you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
- Software Engineer - Editor Engagement
- Software Engineer - Parser
- Software Engineer - Apps
- Software Engineer - Mobile
- Software Engineer - Multimedia Systems
- Software Engineer - Multimedia User Interfaces
- Software Engineer - Search
- Product Manager - Mobile
- Director of User Experience
- Visual Designer
- Partner Solutions Engineer
- Dev-Ops Engineer (SRE)
- Operations Engineer - Database Administrator
- Senior Program Manager - Mobile
Announcements
edit- Ed Sanders joined the Features engineering group as Software Engineer working on Visual Editor (announcement).
- Christian Aistleitner joined as a contractor specializing in work on Gerrit (announcement).
- Marc-Andre Pelletier joined the Technical Operations team as Operations Engineer (contractor), focusing on the Wikimedia Labs infrastructure and migration of tools (announcement).
- Kirsten Menger-Anderson joined the Features group as a part-time contractor Technical Writer focusing on Editor Engagement Experiments.
- Greg Grossmeier joined the Platform engineering group as Release Manager (announcement).
- Site Performance Engineer and Senior Technical Advisor Patrick Reilly's last day with WMF was February 19th (announcement).
Technical Operations
editSite infrastructure
- Both Asher Feldman and Peter Youngmeister are proceeding cautiously in broadening MariaDB deployment in our clusters. We have one MariaDB instance for each of the database clusters (s1 to s7). The MariaDB support team has been quick in resolving bugs we encountered along the way. In another database administration task, Asher reviewed and deployed the Wikidata schema changes and migrated it from s3 cluster to s5, adding more growth capacity.
- We put sixty new application servers into production in each of the two datacenters. This is in anticipation of expected traffic growth coming from both our regular and mobile sites in the coming year.
- Lately we have been experiencing short time-out failures in the nightly search indices built with search-pool4. Asher is experimenting with a fix. He redistributed the search-pool4 indices in the Tampa data center based on sizes and what seems to be a more acceptable index size-to-ram ratio. We essentially have a virtual search-pool5 shard, but with the spelling and highlight indices for pool4 and pool5 sharing the same servers. The pool4 wikis are using the new setup in Tampa, with everything else continuing to use our Ashburn cluster. We should know soon if it works.
- The TechOps team had a in-person team meeting the week of 25th February in WMF's San Francisco office.
- The highlights of the meeting were:
- Discuss the upkeep of the "failover" datacenter and capture the lessons learned from the recent datacenter switchover. For example, we think we could reduce the switchover "readonly" time from 32 minutes to 10 minutes by automating more of the database and caching failover procedures.
- Improve and streamline our hiring process and our security access model.
- Organize a coordinated sprints process to fill the gap between smaller tasks (for which we use the RT ticketing system) and larger tasks (which require department coordination) by collecting some thoughts. We started brainstorming and forming teams via the Projects wikitech page.
- Face-to-face meetings with the Engineering teams from Platform, Mobile, Analytics and Wikidata.
- Short TechOps sprints to reduce cronspams and our RT queue.
- Review budget needs for the 2013-2014 fiscal year.
- Numerous bug fixes were made to the mwxml2sql tool, and a set of SQL files bsed on an English language Wikipedia XML dump was published for use by testers [1]. A tool to convert SQL dumps to escaped tab-delimited format is now available for use with MySQL's LOAD DATA INFILE command, much faster than INSERTs. All SQL fles from the same dump were converted to this format and also published.
- A new mirror has come on line, initially mirroring historical archives of XML files as well as MediaWiki releases, page view statistics and other files [2]. Thanks to Robert Smith and Wansecurity.com for providing the resources to make this happen.
- This month was mostly spent stabilizing Labs components. Labs Ganglia was fixed to report instance statistics properly. Adminbot was updated to fix utf8 issues, and to fix package issues when upgrading. A number of changes were made to the glusterfs support to bring more stability. Gluster was upgraded to 3.3.1 to fix a memory leak on both the client and server. Gluster isn't matching our use case of multitenancy, as the glusterd daemon isn't handling the large number of volumes well. To help with this, until we either fix the issue in gluster, or replace it, we've made a change to not create/manage Gluster volumes for projects unless they opt in. We've also disabled and deleted Gluster volumes for projects that are currently unused. Work was done to turn Puppet classes for installing MediaWiki in Labs into modules, so that they can be reused more easily.
- We merged wikitech.wikimedia.org (our operations and infrastructure documentation) and labsconsole.wikimedia.org together into wikitech.wikimedia.org. wikitech-static.wikimedia.org is available as a backup, in case all access to our cluster is unavailable. Work was started on supporting saltstack reactors, to replace the bootstrapping for instance creation. This month we have new member of the Labs team, Marc-Andre Pelletier, also known in the community as Coren. Coren will be working on the new Tool Labs infrastructure and we're very excited to have him on-board. Asher and Peter started work on replicated databases for Tool Labs during the last week of the month.
Editor retention: Editing tools
editA new contributor, C. Scott Ananian, improved Parsoid's performance by switching the DOM library from JSDom to Domino. He also improved image handling and contributed numerous other patches.
The tokenizer was modified to parse one top-level block at a time, which helps to spread out API requests and minimize the number of tokens in flight. The serializer is in the process of being rewritten to work on DOM input to benefit from the context provided by the DOM. This rewrite is expected to simplify the logic significantly, and help fix some more selective serialization issues that are blocking a deployment to production.
We also used the ops and core hackathon to discuss and refine our storage plans. Finally, we wrote a blog post about Parsoid on the WMF tech blog.Editor engagement features
editFlow Portal/Project information
Editor engagement experiments
editAfter the intial launch of guided tours, Matt Flaschen and other team members worked on A/B testing the effectiveness of guided tours as part of the onboarding new Wikipedians experiences currently enabled on English Wikipedia. Results from these controlled tests are vital to understanding the impact of tours on editor engagement. In the meantime, the GuidedTour extension was enabled on Wikimedia Commons and six Wikipedias (including French, German, and Dutch), so that local administrators and volunteer developers could take advantage of the feature.
In addition to working on polishing and quantifying the effect of guided tours, significant progress was made on a new landing page for the onboarding project, with plans to launch early in March. The new Getting Started page will be expanded to include a wider variety task types offered to new editors. It will also be generated from a basic recommender system coupled with the GettingStarted extension, rather than relying on a bot.
Kirsten Menger-Anderson joined the team as Technical Writer mid-month. She began work with Ori Livneh, Dario Taraborelli, and others on documenting the EventLogging extension, with the goal of producing a comprehensive guide for end users of EventLogging, especially other Wikimedia Engineering teams in need of data. Future work by Kirsten will include similar documentation of the User Metrics data analysis API, which will be opened up for internal use in March.Support
edit- Translate (TUX) enhancements: Development continues full steam ahead on the new translation editor with proof reading feature by Santhosh and Amir. Niklas continues to enhance and test backend translate infrastructure, including Solr integration and other translation aids.
- Plurals support: by Santhosh and Amir to be more consistent with CLDR standards.
- Technical Font Specification for Indic scripts: We kicked off this collaborative project between Red Hat and Wikimedia at the Language Summit in February. Santhosh and Runa are contributors to this project.
- Language Coverage Matrix: This matrix aims to provide an up-to-date status of language support for all tools that the team is developing and maintaining.
- Mediawiki i18n code review: Team continues to support Mediawiki release with i18n code reviews across other features and extensions.
- Mediawiki Language Extension Bundle (MLEB): Monthly release of MLEB completed with release notes by Amir.
- jQuery.IME: Continue to merge input methods contributed into jQuery.IME. We now have 155+ input methods for 75+ languages.
- jQuery.ULS: Continue to maintain jQuery.ULS. Awaiting resolution of deployment issues.
MediaWiki Core
editSite performance and architecture
Security auditing and response
Quality assurance
editAnalytics
editAnalytics/Logging infrastructure
Engineering community team
editValerie published an initial version of a Bug Life Cycle flowchart describing the life of a bug report by its status changes over time, continued investigating feedback channels and workflows of other bigger free software projects, and also helped testing the Commons Upload app for Android and the mobile browser as part of Mobile QA testing. A table on Bugzilla use by development teams was made available.
Furthermore, reachout to several development teams continued to better understand the different bug management needs, and discussions took place about a workflow how to mark fixed tickets as backport candidates in the issue tracker, potentially resulting in the addition of a dropdown menu ("flag") in Bugzilla.Volunteer coordination and outreach
- We have consolidated the QA Weekly goals as a way to orchestrate testing and bug management activities with the wider community. We run two Features testing activities (Article Feedback's new features and Wikipedia + Commons uploads) and two Bug Days (Article feedback and Git/Gerrit). So far it has been useful to coordinate better testing activities across Wikimedia Foundation teams, but we still need better results engaging volunteers.
- The (newly elected) Affiliations Committee is working on finding an agreement with MediaWiki Group Ahmedabad and Wikimedia India regarding whether it should be a chapter Special Interest Groups, a user group, or some other structure.
- Quim Gil went to FOSDEM; as a result, we have now a generic "How to contribute" presentation and video.
- We are helping the organization of the Amsterdam Hackathon 2013 and also helped the Wikimedia Foundation decide what employees would get travel sponsorship to Amsterdam.
The Kiwix project is funded and executed by Wikimedia CH.
- We have migrated our source code repository from Subversion to Git. We have have also focused in February on the revamping of the Kiwix Web site. The new Web site is really more user friendly. Audience continues to grow with 120,000 downloads of the software in February.
The Wikidata project is funded and executed by Wikimedia Deutschland.
In February the first phase of Wikidata (language links) was deployed on the English-language Wikipedia. Additionally the first parts of phase 2 (infoboxes) went life on wikidata.org. It is now possible to add statements. For an example see d:159. The first tools have already been written on top of this, for example Geneawiki and Reasonator. In the meantime more work has been put into additional data-types, like strings and geocoordinates, as well as the foundations of phase 3 (lists based on queries).
In other good news: Wikimedia Germany has decided to fund Wikidata development after the end of the first year of development at the end of March.
Future
edit- The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the engineering roadmap, listing ongoing and future Wikimedia engineering efforts.