Wikimedia Engineering/Report/2014/August
Major news in August includes:
- the Wikimania 2014 conference in London, and the associated hackathon;
- a statement on Wikipedia Zero and net neutrality;
- progress on the new content translation tool and its passing the milestone of 100 translated articles.
Note: We're also providing a shorter version of this report.
Engineering metrics in August:
- 160 unique committers contributed patchsets of code to MediaWiki.
- The total number of unresolved commits went from around 1640 to about 1695.
- About 22 shell requests were processed.
Upcoming events
editThere are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.
For a more complete and up-to-date list, check out the Project:Calendar.
Date | Type | Event | Contact |
---|---|---|---|
9 September 2014 | UploadWizard bug report triage, 1700-1900 UTC in #wikimedia-multimedia connect. | Andre Klapper | |
3 September 2014 | IRC discussion of several RfCs for next actions, 2100-2200 UTC in #wikimedia-office connect. | ||
10 September 2014 | IRC discussion of several RfCs for next actions, 2100-2200 UTC in #wikimedia-office connect. | ||
24 September 2014 | Tech Talk: The Very Basics of Phabricator 1800-1900 UTC in #wikimedia-office connect. | Andre Klapper | |
26 September 2014 | PHP Con Mexico | Andrew Russell Green |
Personnel
editAre you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
- Senior Software Engineer - Services
- Software Engineer - Services
- Software Engineer - Maps & Geo - Mobile
- Software Engineer - Mobile - iOS
- Release Engineer
- Technical Writer
- Full Stack Developer - Analytics
- Research Analyst
- Agile Coach/ScrumMaster - Team Practices Group
- Operations Security Engineer
- Technical Project Manager
- UX Senior Designer
- UX Senior Design Researcher
- UX User Research Recruiter
- UX Visual Design Fellowship
- Mobile Partnerships Regional Manager
- Project Coordinator - Engineering
Technical Operations
editDallas data center
- On August 21, our first connectivity to the new Dallas data center (
codfw
) came online, connecting the new site to the Wikimedia network. The following week, all network equipment was configured to prepare for server installations. The first essential infrastructure services (install server, DNS, monitoring etc.) were brought online in the days following August 25, and we are now working on deploying the first storage & data base servers to start replication & backups from our other data centers.
Labs metrics in August:
- Number of projects: 170
- Number of instances: 480
- Amount of RAM in use (in MBs): 2,116,096
- Amount of allocated storage (in GBs): 22,600
- Number of virtual CPUs in use: 1,038
- Number of users: 3,718
Wikimedia Labs
- Andrew fixed a few sudo policy UI bugs (68834, 61129). Marc improved the DNS cache settings and resolved some long-standing DNS instability (70076). He also set up a new storage server for wiki dumps. This should resolve some long-term storage space problems that led to out-of-date dumps.
- Andrew laid the groundwork for wikitech to be updated via the standard WMF deployment system. We're investigating the upstream OpenStack user interface, 'horizon'.
Editor retention: Editing tools
editUsers of Internet Explorer 11, who we were previously preventing from using VisualEditor due to some major bugs, will now be able to use VisualEditor. Support for earlier versions of Internet Explorer will be coming shortly. Similarly, tablet users browsing the site's mobile mode now have the option of using a mobile-specific form of VisualEditor. More editing tools, and availability of VisualEditor on phones, is planned for the future.
Improvements and updates were made to a number of interface messages as part of our work with translators to improve the software for all users, and VisualEditor and MediaWiki were improved to support highlighting links to disambiguation pages where a wiki or user wishes to do so. Several performance improvements were made, especially to the system around re-using references and reference lists. We tweaked the link editor's behaviour based on feedback from users and user testing. The deployed version of the code was updated three times in the regular release cycle (1.24-wmf17, 1.24-wmf18 and 1.24-wmf19).The TemplateData GUI editor was significantly improved, including being updated to use the new types, and recursive importing of parameters if needed, and deployed on Norwegian Bokmål Wikipedia. The volunteers working on the Math extension (for formulæ) moved closer to deploying the "Mathoid" server that will use MathJax to render clearer formulæ than with the current versions.
The Editing team as usual did a lot of work on improving libraries and infrastructure. The OOjs UI library was modified to make the isolation of dialogs using<iframe>
s optional, and re-organise the theme system as part of implementing a new look-and-feel for OOUI, to make it consistent with the planned changes to the MediaWiki design, in collaboration with the Design team. The OOjs library was updated to fix a minor bug, with two new versions (v1.0.12 and then v1.1.0) released and pushed downstream into MediaWiki, VisualEditor and OOjs UI.The GSoC 2014 LintTrap project wrapped up and we hope to develop this further over the coming months, and go live with it later this year.
With an eye towards supporting Parsoid-driven page views, the Parsoid team worked on a few different tracks. We deployed the visual diff mass testing service, we added Tidy support to parser tests and updated tests, which now makes it easy for Parsoid to target the PHP Parser + Tidy combo found in production, and continued to make CSS and other fixes.Services
editCore Features
editGrowth
editIn August, the Growth team vetted CirrusSearch as back-end for personalized suggestions and prepared its first A/B test of the new task recommendations system. This test will deliver recommendations to a random sample of newly-registered users on 12 Wikipedias: English, French, German, Spanish, Italian, Hebrew, Persian, Russian, Ukrainian, Swedish, and Chinese. Several Growth team members also attended Wikimania 2014 in London. At Wikimania, the team shared presentations on its work and conducted usability tests of the recommendations system. Last but not least, design work began on the third major iteration of the team's anonymous editor acquisition project.
Wikipedia Zero & Partnerships
- Wikipedia Zero page views held steady at around 70 million in August. We launched Wikipedia Zero with three operators: Smart and Sun in the Philippines (related companies) and Timor Telecom in East Timor. That brings our total numbers to 37 partners in 31 countries. Smart has been collaborating with Wikimedia Philippines for months, and they previously offered free access to Wikipedia on a trial basis. Just announced, Smart has now officially joined Wikipedia Zero and brought in their sister brand Sun, covering a combined 70 million subscribers in the Philippines. Timor Telecom launched Wikipedia Zero with a press event including the Vice Minister of Education and much promotion. Timor Telecom is keen to support growth in the Tetun Wikipedia by raising awareness in universities, with resources from the Wikipedia Education Program. In Latin America, we made progress toward app preloads by completing testing for the Qualcomm Reference Design (QRD) program. The Wikipedia Android app is now certified for preload on QRD. We made terrific connections with Global South community members at Wikimania, which will lead to more direct local collaboration between partners and Wikimedia communities. Smriti Gupta, partnerships manager for Asia, moved to India where she will work remotely. We're recruiting our third partnerships manager to cover South East Asia and tech partnerships.
Language engineering communications and outreach
MediaWiki Core
editMark submitted a series of patches to create a service IP and Varnish back-end for an HHVM app server pool, with Giuseppe and Brandon providing feedback and support. The patch routes requests tagged with a specific cookie to the HHVM back-ends. Tech-savvy editors were invited to opt-in to help with testing by setting the cookie explicitly. The next step after that will be to divert a fraction of general site traffic to those back-ends. The exact date will depend on how many bugs the next round of testing uncovers.
Tim is looking at modifying the profiling feature of LuaSandbox to work with HHVM; it is currently disabled.The ability to globally rename users was deployed a while ago, and is currently working excellently!
The ability to log in with old, pre-finalisation credentials has been developed so that users are not inadvertently locked out of their accounts. From an engineering standpoint, this form is now fully working in our test environment. Right now, the form uses placeholder text; that text needs to be 'prettified' so that the users who have been forcibly renamed get the appropriate information on how to proceed after their rename, and more rigorous testing should be done before deployment.
A form to globally merge users has been developed so that users can consolidate their accounts after the finalisation. From an engineering standpoint, this form is now fully working in our test environment. The form needs design improvements and further testing before it can be deployed.
A form to request a rename has been developed so that users who do not have global accounts can request a rename, and also so that the workload on the renamers is reduced. From an engineering standpoint, the form to request a rename has been implemented, and implementation has begun on the form that allows renames to rename users. Once the end-to-end experience has been fully implemented and tested, the form will be 'prettified'.Security auditing and response
Quality Assurance/Browser testing
Multimedia
editThe team also worked to make Media Viewer easier to use by readers and casual editors, our primary target users for this tool. To that end, we created a new 'minimal design' including a number of new improvements such as a more prominent button linking to the File: page, an easier way to enlarge images and more informative captions. These new features were prototyped and carefully tested this month to validate their effectiveness. Testers completed easily most of tasks we gave them, suggesting that the new features are now usable by target users, and ready for development in September.
This month, we prepared a first plan for the Structured Data project, in collaboration with many community members and the Wikidata team: we propose to gradually implement machine-readable data on Wikimedia Commons, starting with small experiments in the fall, followed by a wider deployment in 2015. We also continued our code refactoring for the UploadWizard, as well as fixed more bugs across our multimedia platform. To keep up with our work, join the multimedia mailing list.- Tools for mass migration of legacy translated wiki content
- Wikidata annotation tool
- Email bounce handling to MediaWiki with VERP
- Google Books, Internet Archive, Commons upload cycle
- UniversalLanguageSelector fonts for Chinese wikis
- MassMessage page input list improvements
- Book management in Wikibooks/Wikisource
- Parsoid-based online-detection of broken wikitext
- Usability improvements for the Translate extension
- A modern, scalable and attractive skin for MediaWiki
- Automatic cross-language screenshots for user documentation
- Separating skins from core MediaWiki
- Chemical Markup support for Wikimedia Commons
- Improving URL citations on Wikimedia
- Historical OpenStreetMap
- Welcoming new contributors to Wikimedia Labs and Tool Labs
- Evaluating, documenting, and improving MediaWiki web API client libraries
- Feed the Gnomes - Wikidata Outreach
- Template Matching for RDFIO
- Switching Semantic Forms Autocompletion to Select2
- Catalogue for Mediawiki Extensions
- Generic, efficient localisation update service.
Volunteer coordination and outreach
Following the prototype built for Wikimania, the team identified many performance issues in Wikimetrics for backfilling Editor Engagement Vital Signs (EEVS) data. The team spent a sprint implementing some performance enhancements as well as properly managing sessions with the databases. Wikimetrics is better at running recurring reports concurrently and managing replication lag in the slave DBs.
The team continued monitoring analytics systems and responding to issues when [non-critical] alarms in went off. Packet losses and kafka issues were diagnosed and handled.
Hadoop worker nodes now automatically set memory limits according to what is available. Previously all workers had the same fixed limit. This allows for better resource utilization.
Logstash is now available at https://logstash.wikimedia.org (Wikitech account required). Logs from Hadoop are piped there for easier search and diagnosis of Hadoop jobs.
Some uses of udp2log were migrated to kafkatee. The latter is not prone to packet losses. In particular Webstatscollector was switched over and error rates were seen to drop drastically. Eventually, the “collecting” part of Webstatscollector will be implemented in Hadoop, a much more scalable environment to handle such work.
Analytics/Editor Engagement Vital Signs
The team implemented the stack necessary to load EEVS in a browser and has a rough implementation of the UI according to Pau’s design . The team also made available to EEVS two metrics already implemented on Wikimetrics: number of pages created, and number of edits.
We gave or participated in 8 presentations during the main conference.
We published a report on mobile trends expanding the data presented at the July 2014 Monthly Metrics meeting. We started work on referral parsing from request log data to study trends in referred traffic over time.
We generated sample data of edit conflicts and worked on scripts for robust revert detection. We published traffic data for the Medicine Translation Taskforce, with a particular focus on traffic to articles related to Ebola.
We wrote up a research proposal for task recommendations in support of the Growth team's experiments on recommender systems. We analyzed qualitative data to assess the performance of Cirrus Search "morelike" feature for identifying articles in similar topic areas. We provided support for the experimental design of a first test of task recommendations. We performed an analysis of the result of the second experiment on anonymous editor acquisition run by the Growth team.
We hosted the August 2014 research showcase with a presentation by Oliver Keyes on circadian patterns in mobile readership and a guest talk by Morten Warncke-Wang on quality assessment and task recommendations in Wikipedia.
We also gave presentations on Wikimedia research at the Oxford Internet Institute, INRIA, Wikimedia Deutschland (slides) and at the Public Library of Science (slides). Aaron Halfaker presented at OpenSym 2014 a paper he co-authored on the impact of the Article for Creation workflow on newbies (slides, fulltext).The Wikidata project is funded and executed by Wikimedia Deutschland.
- August was a very busy month for Wikidata. The main page was redesigned and is now much more inviting and useful. A lot of new features were finished and deployed. Among them are:
- Redirects: allowing you to turn an item into a redirect.
- Monolingual text datatype: allowing you to enter new kinds of data like the motto of a country.
- Badges: allowing you to store badges for articles on Wikidata. This includes "featured article" and "good article". More will be added soon.
- In other projects sidebar as a beta feature: allowing you to show links to sister projects in the sidebar of any article.
- Special:GoToLinkedPage: allowing you to go to a Wikipedia page based on its Wikidata Q-ID. This will be especially useful if you want to create links to articles that don't change even if the article is moved.
- Wikinews: Wikinews has been added as a supported sister project. Wikinews can now maintain their sitelinks on Wikidata. Access to the other data will follow in due time.
- Wikidata: Sitelinks to pages on Wikidata itself can now also be stored on Wikidata. This is useful to connect for example its help pages with those on the other projects.
- Change of the internal serialization format: The internal serialization format changed to be consistent with the serialization format that is returned by the API.
- In addition, the team worked on a lot of under-the-hood changes towards the new user interface design and started the discussions around structured data support for Commons. The log of the IRC office hour is available.
Future
edit- The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the annual goals, listing ongoing and future Wikimedia engineering efforts.