Wikimedia Engineering/Report/2014/November
Major news in November include:
- the release of the second version of the Content Translation tool, which heavily relies on Apertium for machine translation;
- updates to MediaWiki's internationalization based on new CLDR data;
- the move from Bugzilla to Phabricator as the new collaboration platform for the Wikimedia technical community.
Upcoming events
editThere are many opportunities for you to get involved and contribute to MediaWiki and technical activities to improve Wikimedia sites, both for coders and contributors with other talents.
For a more complete and up-to-date list, check out the Project:Calendar.
Date | Type | Event | Contact |
---|---|---|---|
25 December 2014 | Tech Talk: Phabricator for Wikimedia projects 1800-1900 UTC in #wikimedia-office connect. | rfarrand |
Personnel
editAre you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
- Director of Engineering
- Senior Software Engineer - Services
- Software Engineer - Flow (Front-end)
- Software Engineer - Mobile - Android
- Application Security Engineer
- Full Stack Developer - Analytics
- Agile Coach
- Scrum Master
- Senior Technical Product Manager
- Community Liaison
- Community Liaison (PT Contract)
- UX Senior Designer
- UX Senior Design Researcher
- UX Visual Design Fellowship
Announcements
edit- Andrew Garrett joined the Wikimedia Foundation as a full time Software Engineer (announcement).
- Yuvaraj Pandian joined the Wikimedia Technical Operations team (announcement).
- Tracy Beasley joined the Design Research Team as Participant Recruiter (announcement).
- James Douglas joined the Platform engineering team as part of the Services group (announcement).
- Stas Malyshev joined the Platform engineering team as part of the MediaWiki Core group (announcement).
Technical Operations
editLabs metrics in November:
- Number of projects: 154
- Number of instances: 440
- Amount of RAM in use (in MBs): 2,131,456
- Amount of allocated storage (in GBs): 21,555
- Number of virtual CPUs in use: 1,047
- Number of users: 4,426
Tool metrics:
- Number of tools: 976
- Number of tool maintainers: 543
Wikimedia Labs
- Yuvi has officially joined the labs team.
- We updated the labs OpenStack install from version 'Havana' to version 'Icehouse'.
- Ldap (used for sign-in on many WMF services) is now de-coupled from the Labs hardware. Ldap has a dedicated server in each of eqiad and codfw.
- Hardware to expand Labs VM capacity in eqiad is now racked. Work on the OS and OpenStack install is ongoing.
- Trusty instances can now pull ssh keys directly from ldap, so logins (on Trusty instances) will still work in case of shared-storage outage
- We now have redirects from tool server to toollabs. This is one of the last steps in sunsetting the tool server.
- Marc added a few experimental Trusty nodes to toollabs.
Editor retention: Editing tools
editIn November, the team working on VisualEditor introduced table structure editing, improved some existing features, and fixed over 100 tasks, bugs and requests.
You can now edit the structure of a table, adding or deleting rows and columns and various other common tasks like merging cells and using captions. VisualEditor now support keyboard shortcuts like entering "*
" at the start of a line to make a bullet list; if you didn't mean to use the "smart" sequence, pressing undo will get back to what you typed. Most wikis now have VisualEditor available as an opt-in tool, whereas previously communities had to ask for it to be switched on.
The toolbar's menus in VisualEditor now open in a collapsed, short form with uncommon tools only shown when requested. You can now create and edit simple "blockquoted" paragraphs for indenting. You can now use a basic editor for gallery and hieroglyphic blocks on the page. Category editing was enhanced in a few ways, including adding a redirect category now adds its target, and making categories without a description page show as red. We improved compatibility with some variations of how wikis use the Flagged Revisions system. Armenian language users now get customised bold and italic icons in the toolbar; if your language would benefit from customised icons, please contact us.
We also made progress on providing a new auto-filled citations tool, and improvements to the link editing and media searching tools, all of which will be coming in the near future.
The deployed version of the code was updated four times in the regular release cycle (1.25-wmf7, 1.25-wmf8, 1.25-wmf9 and 1.25-wmf10).Core Features
editThe team migrated the translation memory service of the Translate extension to ElasticSearch. Thanks to WMF's ElasticSearch cluster, this migration increases the speed and reliability of the service. We have identified one issue with the suggestions, which is being fixed during December. Thanks to Chad and Nik for helping Niklas.
Last, the team also made RTL fixes in MobileFrontend and VisualEditor.The Machine Translation service code was refactored to make it more extensible for other languages and translation services. As an experiment, the Yandex machine translation service was tested. Several fixes related to template adaptation were done. The language selector and the top-bar in the editing interface have been redesigned.
The fourth release is currently underway with a specific goal to prepare the tool for deployment as a beta feature in January.MediaWiki Core
editOne last bit of engineering is being completed in November and executed in early December: contacting existing accounts with unconfirmed email addresses to request confirmation. This will allow for additional formerly globally unattached accounts to be attached without going through any forms or process before SUL finalization takes place.
Library infrastructure for MediaWiki
wfProfileIn()
and wfProfileOut()
calls in the MediaWiki PHP code while still getting the benefit of performance measurements via the XHProf profiling library. Profiling via the XHProf functionality built into HHVM is currently in use in both the beta and production clusters and helping drive some low hanging fruit code improvements.
Bryan is continuing to work on structured logging changes and is testing a Monolog based logging pipeline in Beta to replace the current system. MWFunction::newObj
has been deprecated and all usage in the core or MediaWiki replaced with the new ObjectFactory
class which was introduced by the PSR-3 logging changes.
The cssjanus library has been removed from MediaWiki's core repository and replaced with a Composer managed import from the official upstream. The lessphp CSS pre-processor which was historically manually copied into MediaWiki's git repository is now imported via Composer.
The CDB library originally written by Tim Starling has been extracted to its own git repository and published on Packagist. Both MediaWiki itself and the "multiversion" scripts that are used to manage the WMF wiki family are now importing CDB via Composer instead of the old practice of keeping two copies of the code updated manually in the respective repositories.
The simplei18n PHP library that was developed for the IEG's Grant review application based on code from the Wikimania Scholarships application was transferred from Bryan's personal github account to the official Wikimedia account.
External dependencies for the BounceHandler and Elastica extensions have been removed from the extension git repositories and replaced with Composer managed imports. For the WMF cluster, these dependent libraries have been added to the mediawiki/vendor.git repository. ExtensionDistributor has been updated to package composer managed dependencies in the tarballs it generates for installing extensions. The php-composer-validate test is now applied to all extensions and skins to validate the syntax of composer.json when changes are uploaded to gerrit.
Several classes have been moved from includes/utils to includes/libs (ArrayUtils, MapCacheLRU, Cookie/CookieJar) which makes them easy candidates for publication in stand alone libraries in the future. Aaron is working on a list of possible libraries to create from the MediaWiki codebase that would group several useful classes together. This should produce more sustainable projects than having literally dozens of libraries made up of only a single class.
Security auditing and response
Quality Assurance/Browser testing
Multimedia
editIn November 2014, after releasing requested improvements, the multimedia team started shifting its focus away from Media Viewer and onto bugs in the upload pipeline and UploadWizard.
The team attended the Amsterdam Hackathon, with a focus on GLAMs and structured data. There work was done on a working prototype for what entering structured data on Commons might be like, on research and groundwork for means to track per-file views (a long-standing request from GLAMs) in preparation for Erik Zachte and Christian Aistleitner's RfC as well as parsing image annotations so that they may be displayed in Media Viewer in the future.
The team's focus on Media Viewer has significantly reduced after releasing the last round of improvements that came out of the community consultation. The project has moved to maintenance mode, taking care of major bugs that need immediate attention. The team has provided support for the file metadata cleanup drive and will continue to do so, in order to improve the accuracy of the metadata displayed in Media Viewer.
UploadWizard has seen numerous code cleanup improvements, as well as the trimming down of a few obscure legacy features. This refactoring effort is already making UploadWizard easier to maintain, which supports the team's goal of fixing bugs and improving the efficiency of the upload pipeline.
For more information about our work, join the multimedia mailing list.
- Neta Livneh and Roxana Necula: Wikipedia article translation metrics
- Priyanka Jayaswal: Pywikibot: Compat to core migration
- Anke Nowottne: Wikipedia Education Program need-finding research
- Ankita Shukla: Collaborative spelling dictionary building tool
- Christy Okpo: Improving the Wikimedia Performance Portal
We started applying our new page view definition towards a number of reports and presentations, including an update for the Wikimedia Foundation board on readership trends.
We continued supporting the Mobile team with data on mobile traffic and prepared the launch of two controlled tests of microcontributions on the beta version of the mobile site. We performed preliminary analysis and QA of the data in preparation of a larger test to be launched on the stable site in January.
We concluded data analysis for the test of HHVM and found no conclusive evidence that HHVM substantially affects newcomer engagement in Wikipedia, but hypothesized that HHVM would have effects elsewhere.
We hosted a research showcase with Yan Chen (University of Michigan) as a guest speaker. We finalized a formal collaboration with a team of researchers at Stanford University, to be launched in December. A workshop we submitted to CSCW '15 on creating industry/academic partnerships for open collaboration research was accepted and will be held at the conference in Vancouver on March 14, 2015.The Kiwix project is funded and executed by Wikimedia CH.
- We have released, for the first time, a complete offline version of the Gutenberg project, a 50,000-book public domain online library. This new software solution is able to easily create a complete offline snapshot proposing all the books in HTML and EPUB format. We make the books accessible via a custom and really easy-to-use interface. It's consequently trivial to have this big library available everywhere on your PC, local network or even smartphone. This was the first step of a broader effort to increase outreach of public domain literature; further development will take place in 2015. If you want to know more, read the release announcement.
- Automation and consolidation of the Wikimedia projects dumping process continues its progress. Beside the continuous improvement of mwoffliner, a new small tool called mwmatrixoffliner was released. It uses MediaWiki's Matrix extension to allow dumping of all linguistic versions of a project. As a result, we have started to produce monthly ZIM snapshots of Wikivoyage, Wikinews, Wikiquote, Wikiversity, Wikibooks and Wikispecies. For all theses projects, we make available complete dumps with or without pictures, as a raw ZIM file or pre-packaged with Kiwix in a "portable version" on download.kiwix.org, over BitTorrent and HTTP. We will soon provide this service for bigger projects like Wiktionaries and Wikipedias, and have therefore started to setup new server instances in Wikimedia labs.
- We have also made a new release of TED talks ZIM files. This follows an effort to improve the user interface; these updated files benefit from a slightly reviewed user interface.
The Wikidata project is funded and executed by Wikimedia Deutschland.
- Wikidata won the Open Data Award in the category Publisher by the Open Data Institute. Development in November focused on:
- performance improvements
- introducing language fallbacks (so you will see labels in other languages you likely speak if they are not available in your language)
- statements on properties (so you can indicate that one property is the inverse of another property or that a given property on Wikidata corresponds to another one on another website)
- redesigning the sitelinks section.
- Wikidata was also a big topic at the GLAM hackathon in Amsterdam and was used for many great applications like the Sum of all Paintings. An office hour about structured data on Commons was well-attended.
Future
edit- The engineering management team continues to update the Deployments page weekly, providing up-to-date information on the upcoming deployments to Wikimedia sites, as well as the annual goals, listing ongoing and future Wikimedia engineering efforts.