Wikimedia Technology/Goals/2018-19 Q2

TriangleArrow-Left.svgQ1 Wikimedia Technology Goals, FY2018–19, Q2 (October - December) Q3TriangleArrow-Right.svg


IntroductionEdit

The Technology Department has a number of annual goals in support of the Wikimedia Foundation's Annual Plan; this work is detailed in the Annual Plan. Our remaining work falls into four broad areas—foundational, sustaining, supporting our technical community, and supporting the overall community health.

All Technology programs fall under the primary goal of Knowledge as a Service/Foundational Strength - evolve our systems and structures, with the exception of TEC5: Scoring Platform and TEC9: Address Knowledge Gaps which fall under the primary goal of Knowledge Equity - grow new contributors and content.

Purpose of this documentEdit

Goals for the Wikimedia Technology department, for the second quarter of fiscal year 2018–19 (October - December 2018). The goal owner in each section is the person responsible for coordinating completion of the section, in partnership with the team(s) and relevant stakeholders.

Goals for the Audiences department are available on their own page

LegendEdit

ETA (Estimated Time of Arrival) fields may use the acronym EOQ (End of Quarter) or EOY (End of Year).

Status fields can use the following templates:   To do,   In progress,   Blocked,  N Postponed,  N Stalled,   Partially done, or   Done

Technology Departmental programsEdit

TEC1: Reliability, Performance, and MaintenanceEdit

Goal Owners: Mark Bergsma; Ian Marlier; Nuria Ruiz; Bryan Davis

Q2 Goals are   Partially doneEdit

Detailed status here.

Wrap-up for Q2 by team:Edit

AnalyticsEdit

  • Continue upgrading to Debian Stretch   Done
  • Order and configure hardware for dbstore1002's replacement   Done
  • Add prometheus metrics for varnishkafka task T196066   In progress
  • Working on a strategy and scripts for updating superset (http://superset.wikimedia.org) task T211706   In progress

SREEdit

  • Refresh hardware and perform necessary maintenance - will be   Done by end of December

SRE / TrafficEdit

  • ATS production-ready as a backend cache layer will be   Done by end of December
  • Migrate most standard public TLS certificates to CertCentral issuance   Done
  • Increase Network Capacity   Partially done, will be completed in Q3 due to ongoing fundraising efforts

RelEngEdit

  • Determine the procedure and requirements for an automated MediaWiki branch cut   Done

PerformanceEdit

  • Train feature developers on the use of performance metrics to detect and address regressions   In progress
  • Deliver high-traffic images as WebP   Done
  • Improve Navigation Timing data, by moving it from Graphite to Prometheus   Partially done
  • Expand mobile testing   In progress
  • Expand outreach and engagement to wider Performance community   In progress
  • Test the effect of MediaWiki commits   In progress
  • Ongoing maintenance of components owned by Performance is always   In progress; for this quarter's work, we're   Done
  • Anonymized data publishing is  N Stalled and deferred to Q3
  • Research performance perception in order to identify specific metrics that influence user behavior   Done with follow-up in Q3

WMCSEdit

  • Continue replacing Trusty with Debian Jessie/Stretch in infrastructure layer   In progress and will continue in Q3
  • Communicate Trusty deprecation to Cloud VPS community   Done
  • Develop Trusty deprecation plan for Toolforge   Done and communicate that timeline to community   Partially done
  • Track progress towards full removal of Trusty from Cloud VPS   In progress, will continue in Q3
  • Migrate 50% of Cloud VPS projects to the eqiad1 region and its Neutron SDN layer   Done, will continue in Q3


TEC2: Modern Event PlatformEdit

Goal Owner: Nuria Ruiz

Q2 Goals are   DoneEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Development of intake service for events whose transport is JSONSchema/http   Done


TEC3: Deployment PipelineEdit

Goal Owner: Greg Grossmeier

Q2 Goals are   Partially doneEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Formalize the collection of CI infrastructure and tooling metrics is   Partially done will continue in Q3 to expose the interface metrics
  • Develop set of metrics to assess incident reports/post mortems is   Done, will probably do follow-up work in Q3
  • Adopt more services into Deployment pipeline - migrate graphoid to the Deployment pipeline is  N Postponed as Graphoid is now recommended for stewardship review, zotero v2 is   Done
  • Deploy blubberoid   Done
  • Reprise the work on the logging infrastructure   In progress and will continue in Q3


TEC4: PHP7 MigrationEdit

Goal Owners: Mark Bergsma and Ian Marlier

Q2 Goals are   DoneEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Ability to serve a % of production traffic from PHP7 are mostly   Done with final code reviews   In progress
  • Sampling profiler for PHP7 has been identified and is prepared for use in the WMF production environment   Done
  • Identify and address code issues and opportunities under PHP 7.2   Done


TEC5: Scoring PlatformEdit

Goal Owner: Aaron Halfaker

Q2 Goals are   DoneEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Upgrade Celery   Done
  • Fix logging for logstash   Done
  • Implement edit quality models for translatewiki    Partially done and will wrap up by end of December
  • Document Feature Injection in The ORES Manual   Done
  • Blog announcement of Undisclosed Paid Editors dataset   Done
  • Resubmit ORES paper to the Journal of Computing  N Cancelled
  • JADE --> Production   Partially done to be finished up in Q3


TEC6 Address Infrastructure GapsEdit

Goal Owner: Mark Bergsma

Q2 Goals are   Partially doneEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Begin the implementation of Q1's Logging Infrastructure design is mostly complete and will be   Done by end of December
  • Expand modern metrics infrastructure coverage is mostly complete and will be   Done by end of December
  • Design and prepare infrastructure for database binary backups is   In progress and will continue in Q3
  • Test Performance implications of MySQL TLS connectivity  N Stalled on DBA technology selection/implementation due to other work requirements that have higher priorities
  • Start migrating watchlist last-view updates is  N Stalled due to emergent work and other higher priority work, we hope to get it done in early Q3
  • Expand Spicerack library and SRE Cookbooks conversion is   Partially done and will continue into Q3
  • Expand Netbox usage   Done with stretch goals to be done in Q3


TEC7: Environmental SustainabilityEdit

Goal Owner: Erika Bjune

Q2 Goals are   Partially doneEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Identify and contract with an organization that can assess WMF's environmental footprint is   Done
  • Work on an actionable plan for reducing WMF's environmental footprint is now   In progress and will be on-going for ~6 months.


TEC8: Search PlatformEdit

Goal Owner: Erika Bjune

Q2 Goals are   Partially doneEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Find and hire a contractor to help with NLP work   Done
  • Begin working on one internal NLP project   Partially done
  • Improve autocomplete of Wikidata items   Partially done and will continue in Q3
  • Prototype a feature that is based on collected data   Partially done and will continue in Q3
  • Finish up the Korean morphological library analysis and get ready for deploy into production when ES6 is completed   Done
  • General language support is always   In progress
  • Search for licenses in Commons is  N Stalled as we await further instructions from SDoC program
  • Split the search clusters to increase stability   Done
  • Continue replacing ElasticSearch servers (end of life maintenance)   Done
  • Separate the Wikidata ElasticSearch implementation into a separate extension   Partially done and will continue in Q3
  • Migrate ElasticSearch cluster restart scripts as cookbooks using Spicerack   In progress as more testing is needed
  • Performance and bug fixes for WDQS is always   In progress
  • Service Level Objective (SLO) work for WDQS is   In progress and will continue in Q3
  • Investigate Blazegraph support options and alternatives   Done


TEC9: Address Knowledge GapsEdit

Goal Owner: Leila Zia

Q2 Goals are   Partially doneEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Iterate on and improve the report of the state of the art on bias detection and algorithm audibility is  N Postponed
  • Build a section recommender system based on the section mapping algorithm is   Partially done and will be finished by end of December
  • Build a test API for the section recommendation algorithm is  N Postponed until Q3
  • Improve article recommendation API to completion (of the second stage improvements) is   Partially done and will be finished by end of December
  • Expand the taxonomy of Wikipedia readers is   Done
  • Preparing the infrastructure for conducting the survey is   Partially done and will be finished by end of December
  • Devise the framework for matching newcomers to improve the first design of the framework   Partially done and will be continued in Q3
  • Develop and test a new experiment plan for testing the quality of the algorithm to elicit user interests   Done
  • Finalize the documentation for the research on characterizing Wikipedia readers   Done
  • A series of presentations about the results on characterizing Wikipedia readers   Done


TEC10: Build Technical CommunityEdit

Goal Owner: Bryan Davis

Q2 Goals are   Partially doneEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Plan and visibly improve Toolforge technical documentation is   Partially done and will continue in Q3
  • Survey Wikimedia Foundation staff to gauge interest and support for reviving Tech Talks is   Done
  • Develop plan for Tech Talks reboot is   In progress and will continue in Q3
  • Update visual design and content of MediaWiki.org Main Page will be   Partially done by end of January 2019
  • Support Outreachy Round 17   Done
  • Support Google Code-In 2018   Done
  • Review and improve top viewed overview pages of the Action API   Done
  • Submit a proposal for the Wiki Research Workshop   Done


TEC11: Support Fundraising ActivitiesEdit

Goal Owner: Erika Bjune

Q2 Goals are   Partially doneEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Support Advancement in all Q2 activities   Partially done and will continue in Q3


TEC12: Developer ProductivityEdit

Goal Owner: Greg Grossmeier

Q2 Goals are   Partially doneEdit

Detailed status here.

Wrap-up for Q2:Edit

  • The Annual Developer Productivity Survey results are synthesized and shared, creating a first year baseline is   Partially done and will continue in Q3 to get additional feedback


TEC13: Code HealthEdit

Goal Owner: Greg Grossmeier

Q2 Goals are   In progressEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Update/refresh review queue is   Partially done and will continue in Q3
  • 5 of the 15 prioritized repositories have at least 1 end-to-end test is   In progress as the team talks with stakeholders
  • Assess Platform unit test practices and define improvement plan is   In progress and will continue in Q3
  • Core Platform and Search Platform teams are using TDM PoC   In progress and will continue in Q3
  • Identify key Tech Debt areas and add a process for management   In progress and will continue in Q3
  • Metrics defined and deployed for all 4 Code Health areas   Partially done


TEC14: Smart Tools for Better DataEdit

Goal Owner: Nuria Ruiz

Q2 Goals are   Partially doneEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Create report for "articles with most contributors" in Wikistats2   To do
  • Create report for Active editor metrics per project family   To do
  • Provide easier mapping between Wikistats1 metrics and Wikistats2 metrics   To do
  • Provide ability to query metrics per project family in Wikistats2   Done
  • Add per family unique devices to analytics query service   Done
  • Automatic ingestion from eventlogging data into turnilo datasets that area available for easy exploration   Done
  • Automation of data sanitization for eventlogging schemas in the hadoop backend   Done
  • Presto cluster online and infrastructure accessible by Cloud (labs) users   Done
  • Edit Data Lake Quality - resolve known issues (ongoing)   In progress
  • POC More efficient Bot filtering on pageview data   Done
  • Productionize MediaWiki content processing, ngest and process XML dumps   Done

Cross-departmental programsEdit

CDP1: Privacy, Security, and Data ManagementEdit

Segment 2 - SecurityEdit

Goal Owner: John Bennett

Q2 Goals are   In progressEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Review and mature our security policies and awareness functions is   Done but the phishing campaign is  N Stalled to be completed in Q3
  • Testing campaigns:
    • CSP changes are now   Done
    • 1st round of pen testing (on en wikipedia) is   Done
    • OIT assessment is  N Cancelled, might be picked up in 2019.
    • NIST CSF assessment is  N Stalled, should be picked up again in early 2019.
    • Initial discussion is   In progress to include Phan into MW core and should be completed by end of December.
  • Finalize and test our Incident Response documentation is   In progress and will continue in Q3


Segment 3 - AnalyticsEdit

Goal Owner: NRuiz (WMF)

Wrap-up for Q2:Edit

  • More restrictive Firewall rules for Kafka. task T204957  N Postponed
  • Review the requirements for a service implementing a stronger user authentication scheme for the Analytics Hadoop cluster and possibly for other related tools (like Zookeeper).   Done
  • STRETCH GOAL: implement a prototype in labs that the Analytics team can test and evaluate. task T198227   Done

CDP2: Platform EvolutionEdit

Segment 7 - Core PlatformEdit

Goal Owner: Corey Floyd (WMF)

Q2 Goals are   In progressEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Define and implement a session management service   Done


Segment 8 - Core Platform (WMDE)Edit

Goal Owner: Corey Floyd (WMF)

Q2 Goals are   In progressEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Wikimedia Technical Conference: participate and analyze session output is   Done and will be published soon


CDP3: Knowledge IntegrityEdit

Segment 1 - ResearchEdit

Goal Owner: Dario Taraborelli

Q2 Goals are <  In progressEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Design a machine learning framework to identify why statements need a citation in English Wikipedia   Done
  • Submit a paper summarizing the modeling work for unsourced statement detection   Done
  • Run the second round of data collection to understand Wikipedia citation usage   Done
  • Prepare the data and analyze the data collected in the second round   Done
  • Perform first round of survey data collection of reader citation usage on English Wikipedia   Done
  • Analyze first round survey data of reader citation usage   In progress and will continue in Q3
  • Host WikiCite 2018 event   Done

CDP4: Structured dataEdit

Segment 2 - Search PlatformEdit

Goal Owner: Erika Bjune

Q2 Goals are   In progressEdit

Detailed status here.

Wrap-up for Q2:Edit

  • Allow search by type of license  N Stalled until Q3