I am a developer with some SRE sensibilities, working on the Platform Engineering Team. Basically I write code. I work primarily on the XML dumps infrastructure. I'm very interested in anything that touches on translation of content or interfaces, and anything that impacts the multilingual reach or the Wikimedia projects or facilitates communication between the various language communities of the projects, but this is out of the scope of my WMF work.
If you want to reach me quickly, look for me on irc in #wikitech-l with the user name apergos. Timezone: EET. If you want to reach me the slow way send an email to user name ariel with domain name wikimedia.org. (Grrr, spammers!)
All things dump-related that I'd love to see move forward:
- Excerpts of the dumps in various formats from specific projects. Wiktionary is a popular request.
- Downloadable sql files that can be directly imported into fresh empty wikis to create a mirror of any project with current content
- Image tarballs with 200px thumbnails for all images used on the projects
- Maintained cross-platform easy-to-use tool for converting XML dumps to MySQL for import (eg mwimport).
- Object store for revision content grouped by page, so that folks could mix and match their own dumps
- Dump content or metadata files bundled up for pulling into hadoop
Current dumps links:
- Information about publicly available dumps
- Documentation for dumps maintainers
- Bugs and feature requests for dumps
Old decrepit links:
- Research Data Proposals (WikiSym 2010)
- Quality Assessment Tools for WP Readers (WikiSym 2010)