Wikimedia Performance Team/Sprints

2023

edit

Outreach:

Insights:

Improvement:


Other goals that we considered but were post-poned, cancelled, or incomplete:

2022

edit

Insights:

Improvement:

  • Research opportunities in static.php traffic to identify simpler and longer-lasting caching policies. Reduce backend traffic to static.php by more than 70%, and removing a custom WMF-specific endpoint in the process, in favour of standard MediaWiki routes, requiring less maintenance going forward. (T285232, T302465)

Other goals that we considered but were post-poned, cancelled, or incomplete:

2021

edit

See also internal 2021-2022 roadmap and internal Jan-Mar 2022 achievements.

Outreach:

  • Support product development by Inuka Team (Wikipedia Preview), Reading Web (NearbyPages, and RelatedArticles), CPT (WebAuthn), Design Systems Team (WVUI/Vue.js), and WMDE (Kartographer-revid)
  • Participate in SLO working group to help establish an SLO around MediaWiki Save Timing SLO.
  • Participate in W3C WebPerf WG, provide feedback to Chrome team on Google Web Vitals and Chrome bugs.
  • Organise the Web Performance devroom for FOSDEM 2021 (recordings).
  • Speak at the We Love Speed conference (recording).
  • Organise four Web Perf Hero awards.

Insights:

  • Migrate our device lab to BitBar.
  • Evaluate and build proof-of-concept synthetic testing on bare metal instead of at AWS.
  • Write runbooks for investigating RUM alerts, WPT alerts, and WPR alerts.
  • Support to SRE Observablity in developing a new Prometheus-compatible MW-Stats client library.
  • On-going maintenance of WebPageTest, WebPageReplay, and Fresh-node.

Improvement:

  • Multi-DC: Deploy MainStash DB and migrate away from Redis-based MainStash (T212129).
  • Multi-DC: MariaDB-TLS tested and enabled for all wikis.
  • Multi-DC: CDN routing logic written and deployed to Beta and Prod behind feature flag.
  • ResourceLoader debug mode v2, reduce wait time on complex pages from ~1 minute to ~1 second.
  • Guidance and code review for DBA-led normalization of "templatelinks" MediaWiki database table, to reduce storage pressure and improve query performance. (T299417)
  • Support to SRE ServiceOps for MW-on-K8s project.
  • Develop precache-based GlobalUserEdit API for CentralAuth, following an incident.

2020

edit

See also internal 2020-2021 roadmap.

Outreach:

Insights:

  • Expand navtiming RUM metrics pipeline with new Layout Shift metric.
  • Kobiton setup for our device lab, expand to include iOS in addition to Android.
  • Explore BitBar for our device lab.
  • Explore moving WPT/WPR infra away from AWS.

Improvement:

  • Multi-DC: Implement multi-dc strategy for ChronologyProtector (T254634).
  • Multi-DC: Determine and start implementing strategy for MainStash DB (T212129).

2019

edit

See also 2019-20 Q1#Performance and internal 2019-2020 roadmap.

  • Outreach:
    • Design and implement the AS Report, to expand and formalize collaborations to leverage our influence with browsers vendors and ISPs. (Announcement on Techblog).
    • Initiate and work on Wikimedia Foundation becoming an official W3C member organization. This expands the Performance Team's participation in web standards and moves us from an "invited expert" (individual) to a represented membership organisation. (Announcement on wikimediafoundation.org)
    • Support product launches by Parsing Team (Parsoid-PHP launch), Editing Team (DiscussionTools launch), Growth Team (GrowthExperiments launch), and Inuka Team (Wikipedia KaiOS app launch).
    • Support RelEng around establishing production error triage workflows and semi-automation thereof.
    • Organise WMF-wide frontend web performance training.
    • Provide performance expertise to Frontend Architecture Working Group (FAWG).
    • Get published in the Web Performance Calendar (2x: Measuring LT and FID, Big questions on RUM)
  • Insights:
    • Research and develop and test new RUM metrics that better match user perception (T187299, Meta-Wiki, Rossi 2019 paper).
    • Organise and oversee implementation of First Paint metric in WebKit for Apple Safari (blog post).
    • Introduce automatic developer-facing performance metrics for specific chunks of MediaWiki code in core and extensions, powered by WANObjectCache (T197849).
    • Add more RUM metrics to the navtiming pipeline, including instrumentation for First Input Delay (T332012).
    • Participate in Chrome Origin trial for Element Timing and provide feedback on upcoming W3C standard (blog post).
    • Release WikimediaDebug v2 (blog post).
    • Create our own Mobile Device Lab.
    • On-going first-respondence to synthetic testing alerts, including investigating regressions after Chrome/Firefox releases and comms with upstream browser vendors.
    • On-going maintenance of WebPageTest and WebPageReplay.
    • On-going maintenance of XHGui, including dealing with MongoDB becoming non-free software by developing and upstreaming MySQL drivers for XHGui, and migration our install from MongoDB to MySQL.
  • Improvements:
    • PHP7 Transition: Finish the transition from HHVM and support SRE with instrumentation, sampling, and benchmarking.
    • Multi-DC: Start work on MainStash DB.
    • Faster MediaWiki backend startup time to reclaim PHP7 latency increase in certain areas. (T233886, T189966).
    • Faster page load time, by reducing ResourceLoader startup cost (blog post).
    • Guidance, CR and testing for new AbuseFilter parser (development by Daimona) to improve Save Timing (T156095).

2018

edit

See also 2018-19 Q1, 2018-19 Q2, and internal 2018-2019 roadmap.

Insights:

Outreach:

Improvement:

  • Annual Plans/FY2019/TEC1: Improve MediaWiki availability and reduce read-only impact from data center switchovers.
    • Multi-DC: Develop integration and support for Mcrouter service in MediaWiki's WANObjectCache, support SRE's rollout of mcrouter service. (T198239)
  • Annual Plans/FY2019/TEC4: PHP7 Migration: Guide the work and support other teams.
  • Introduce support for packageFiles to ResourceLoader (T133462).
  • Introduce support for WebP compression format to Thumbor.
  • Reduce page load time by refactoring the startup module to need only one roundtrip instead of two, effectively loading jQuery in parallel outside the critical path. (T192623).

2017

edit

See also Annual Plan/2017-2018#Technology, 2017-18 Q3, 2017-18 Q4, and internal 2017-2018 roadmap.

Outreach:

Insights:

  • Program 1. Availability, performance, and maintenance.
    • All production sites and services maintain current levels of availability or better.
    • Maintain a comprehensive toolset to measure the performance of our platforms.
  • Research reverse proxies technologies with objective to obtain more stable metrics from synthetic testing infrastructure, increasing confidence, reduce minimum regression size for detection. Evaluated Mahimahi, WebPageReplay, and mitmproxy; selected WebPageReplay. Deployed WebPageReplay+Browsertime to complement and eventually replace WebPageTest (T153360).
  • Implement a performance alerting system atop Grafana. Establish it as a practice for other teams to follow. Two teams used it in the first year. T153169
  • Develop new "navtiming2" metric definitions, addressing what we learned since 2015, and enable use of stacked graphs (T104902, blog post).

Improvement:

  • Support for HHVM-PHP7 migration and upgrade, including development of php-excimer (T176916, blog post)
  • Support regular data center switchovers, including development of EtcdConfig in MediaWiki core (T156924, T160178)
  • Expand support in Thumbor to private wikis. Thumbor service replaces MediaWiki ImageHandler (3-part blog post series).
  • Program 8. Progress towards multi-datacenter support (wikitech:Performance/Multi-DC MediaWiki).
  • Faster Wikipedia time-to-logo. (blog post, T100999)
  • Faster edit save timing. (blog post)
  • Faster page load time. Reduce load time on 3G-Slow connections by one whole second, from 14s to 13s. T164299#3572231
  • Phase out "mediawiki.legacy.wikibits" module to reduce page view cost. T122755
  • Migrate MediaWiki core and all deployed extensions to jQuery 3, multi-month cross-team effort. T124742

2016

edit

See also Perf Matters at Wikipedia in 2016 (Blog post), and Annual Plan/2016-2017 Program 4: Improve site performance.

Insights:

  • Enhance performance testing infrastructure, including speeding up the infrastructure to achieve hourly testing instead every 3 hours (T151197), and adding new metrics for DOM size (T159362).

Improvement:

2015

edit

See also Perf Matters at Wikipedia in 2015 (Blog post).

See also

edit