Wikimedia Technology/Annual Plans/FY2019/TEC1: Reliability, Performance, and Maintenance

This program acts as the cornerstone for any other program at the Foundation; in the absence of the work that goes into the maintenance of our infrastructure the Foundation cannot deliver on its mission.

Updated Goal Status

Program outline

edit

Teams contributing to the program

edit

Site Reliability Engineering, Analytics, MediaWiki Platform, Wikimedia Cloud Services, Performance

Annual Plan priorities

edit

Primary Goal: 3. Knowledge as a Service - evolve our systems and structures

How does your program affect annual plan priority?

edit

Wikimedia's sites and services and underlying technical infrastructure are core to its work in furthering its mission. This program is about sustaining and evolving the infrastructure and structures that support previous achievements as well as future work.

Program Goal

edit

We will maintain the availability of Wikimedia’s sites and services for our global audiences and ensure they’re running reliably, securely, and with high performance. We will do this while modernizing our infrastructure and improving current levels of service when it comes to testing, deployments, and maintenance of software and hardware.

Outcome 1: Current levels of service are maintained and/or improved for all production sites, services and underlying infrastructure.

edit
Output 1.1
Deploy, update, configure, and maintain and improve production services, platforms, tooling, and infrastructure (Traffic infrastructure, databases & storage, MediaWiki application servers, (micro)services, network, Infrastructure Foundations, Analytics infrastructure, developer & release tooling, and miscellaneous sites & services)
Output 1.2
Maintain data center infrastructure and equipment lifecycle from procurement through break-fix to decommissioning
Output 1.3
Improve security, stability, performance and scalability of MediaWiki.
Output 1.4
Perform incident response, diagnosis, and followup on system outages or alerts across our stack.
Output 1.5
We have scalable, reliable and secure systems for data transport and storage.

Outcome 2: Better designed systems

edit
Output 2.1
Assist in the architectural design of new services and making them operate at scale

Outcome 3: Users can leverage a reliable and public Infrastructure as a Service (IaaS) product ecosystem for VPS hosting.

edit
Output 3.1
Maintain existing OpenStack infrastructure and services
Output 3.2
Pay down technical debt and allow upgrading of the core OpenStack platform to modern, supported releases by replacing the current network topology layer with OpenStack Neutron, which has become the standard for most OpenStack deployments.
Output 3.3
Increase availability of compute resources for the IaaS product by expanding deployment of physical resources beyond the current single broadcast domain

Outcome 4: Members of the Wikimedia movement are able to develop and deploy technical solutions with a reasonable investment of time and resources on the Wikimedia Cloud Services Platform as a Service (PaaS) product.

edit
Output 4.1
Maintain existing Grid Engine and Kubernetes web services infrastructure and ecosystems.

Outcome 5: Performance and Function of Wikimedia properties on mobile devices is tested and monitored

edit
Output 5.1
Performance testing of both the mobile web and native app experiences is conducted on a regular basis, in order to identify regressions in the user experience
Output 5.2
Wikimedia native applications are instrumented for performance monitoring similarly to our web properties

Outcome 6: Improved MediaWiki availability and reduced read-only impact from data center fail-overs

edit
Output 6.1
Production deployment of routing of MediaWiki GET/HEAD requests to the secondary data center.

Resources

edit
People FY2017–18 FY2018–19
Analytics
  • Engineer
  • 0.5 ✕ Engineer
  • Engineer (no change)
  • 0.5 ✕ Engineer (no change)
Release Engineering
  • Engineer
  • Engineer
  • Engineer
  • Engineer
  • Engineer
  • 0.25 ✕ QA Engineer
  • 0.5 ✕ Software Engineer
  • 0.25 ✕ Sr Software Engineer
MediaWiki Platform
  • none
  • 0.33 ✕ Architect (reallocated)
  • 0.5 ✕ Engineer (reallocated)
  • Engineer (new)
WMCS
  • Operations Engineer
  • Operations Engineer
  • Operations Engineer
  • Operations Engineer
  • Operations Engineer
  • 0.25 ✕ Product Manager
  • Operations Engineer (no change)
  • Operations Engineer (no change)
  • Operations Engineer (no change)
  • 0.33 ✕ Product Manager (reproportioned)
Performance
  • Principal Performance Engineer
  • Senior Software Engineer
  • Senior Software Engineer
  • Performance Engineer
  • Principal Performance Engineer (no change)
  • Senior Software Engineer (no change)
  • Senior Software Engineer (no change)
  • Performance Engineer (no change)
Site Reliability Engineering
  • Director of Site Reliability Engineering
  • Director of Site Reliability Engineering
  • Engineering Manager
  • Senior Operations Engineer
  • 0.5 ✕ Senior Operations Engineer
  • Senior Database Administrator
  • Operations Engineer
  • Operations Engineer
  • Operations Engineer
  • Operations Engineer
  • Operations Engineer
  • Operations Engineer
  • Operations Engineer
  • Operations Engineer
  • Database Administrator
  • Software Engineer
  • Software Engineer
  • Datacenter Engineer
  • Traffic Security Engineer
  • Director of Site Reliability Engineering (no change)
  • Director of Site Reliability Engineering (no change)
  • Engineering Manager (no change)
  • Senior Operations Engineer (no change)
  • 0.25 ✕ Senior Operations Engineer (reduction)
  • Senior Database Administrator (no change)
  • Operations Engineer (no change)
  • Operations Engineer (no change)
  • Operations Engineer (no change)
  • Operations Engineer (no change)
  • Operations Engineer (no change)
  • Operations Engineer (no change)
  • Operations Engineer (no change)
  • Operations Engineer (no change)
  • Database Administrator (no change)
  • Software Engineer (no change)
  • Software Engineer (no change)
  • Datacenter Engineer (no change)
  • Traffic Security Engineer (no change)
Travel & Other
  • (missing)
  • 2 x Wikimedia Hackathon (new)

Targets

edit

Outcome 3

edit
Target
Ubuntu operating systems completely replaced by Debian
Measurement method
  1. 100% of OpenStack infrastructure services served from hosts running Debian Jessie or newer operating systems by end of FY2018/19 Q3.
  2. 100% of Cloud VPS hosted instances running Debian Jessie or newer operating systems by end of FY2018/19 Q3.
Target
Full deployment of OpenStack Neutron as software defined networking (SDN) layer for Cloud Services OpenStack clusters
Measurement method
  1. Nova-network SDN removed from all Cloud Services OpenStack clusters by end of FY2018/19 Q2.
Target
Expand OpenStack hosting to multiple broadcast domains
Measurement method
  1. Virtual machine hosting in a second broadcast domain available for alpha testing by end of FY2018/19 Q4.

Dependencies

edit