Reading/Web/Notable incidents

This page is similar to the Release timeline but specifically to record trends/bugs in the site. For dates different branches went live see MediaWiki_1.33/Roadmap.

2023

edit

February

edit

6th-8th [FIRING:1] HighTimeToPreview (critical rweb)

Loss of data in analytics pipeline. Fixed byhttps://gerrit.wikimedia.org/r/c/operations/puppet/+/887762

2022

edit

December

edit

13th

Changes in the page previews extension led to a notable error spike breaking the feature for older browsers https://phabricator.wikimedia.org/T325113

April

edit

7th-25th

Due to a bug (phab:T305442#7878587) unexpected changes to the sampling rate in web scroll schema:

  • Logged in users were sampled at 1% for French Wikipedia, 10% for all other projects, rather than 100%.
  • anonymous users set to 0% rather than 1% for French Wikipedia, 10% for all other projects

February

edit

1st

[Vector 2022 skin] Several group 1 wikis were accidentally opted into the new Vector skin leading to approx 1000 users opting out within 24hrs. phab:T299927

2021

edit

August

edit

5th-11th

The instrumentation for VirtualPageViews (which tracks page previews) was pointed to an old schema resulting in a loss of data. E-mail alerts were setup to protect against this happening again. T288655

19th

Japanese Wikipedia gadget causes a JavaScript error spike alert and is fixed.

January

edit

21st

We switched from localStorage to session storage for tracking open sections in the mobile site. Code for cleaning up localStorage entries had a bug, so when the deploy was rolled back it left the mobile site in an error state and an error was thrown for every page view where the new code had been executed. This is recorded in phab:T272638. We backported a fix in the event we might need to roll back again and resumed the deployment. The errors disappeared after the deploy.

We saw over half a million errors in our error logging pipeline during this time (usually we see under 10,000 in a given day). Amazingly nothing collapsed.

2020

edit

October

edit

5th

An edit to the Portuguese Wikipedia MediaWiki:Common.js led to a huge spike in client side errors (around 20,000+ errors out of 35,000 were caused because of this). Luckily it was caught and fixed within the hour.

April

edit

28th

OOUI triggered small performance regression T252844 due to a growth quick survey campaign.

February

edit

11th

Wiki Loves Folklore banner campaign triggers noticeable spike in image payload

January

edit

9th

Performance regression noted but due to banner campaign. https://phabricator.wikimedia.org/T243105

13th

Site scripts and styles (e.g. MediaWiki:Common.css) were loaded on mobile and swiftly reverted (Caught via grafana). Luckily never hit production. https://phabricator.wikimedia.org/T237050#5800024

2019

edit

December

edit

2nd

Speedindex 3G took a big hit on December 2. Fundraising campaign?

September

edit

5th

Performance regression noted on site. However we believe this is likely a problem with the tooling or the Chrome browser not the site itself https://phabricator.wikimedia.org/T232174

August

edit

1st

MobileWebMainMenuClickTracking broken in deploy. Disabled shortly after.

July

edit

30th

Spike in EventLogging errors during a deploy of the broken main menu schema

25th

MobileWebMainMenuClickTracking was broken in train deploy (T220016)

April

edit

8th (1.33.0-wmf.25)

Error spikes 75k! Seems to be related to T219841

March

edit

29th (1.33.0-wmf.23)

Error spikes 5-15k.

7th (1.33.0-wmf.20)

Errors spikes to 12k-13k. SWAT fix for T217820 shortly after has little to no impact so this is likely a new error.

February

edit

21st (1.33.0-wmf.18)

a new deploy causes errors to spike to around 8K an hour (a little less than the spike on 15th). Some of these appear to relate to skins.minerva.top (I4db0551a7661eb5c41d7b2a27e78afb885bb9ce5) which probably should have been shipped in wmf.19 NOT 1.33.0-wmf.18.

20th

Around 2-4k errors as bugs related to caching ceased.

15th (1.33.0-wmf.17)

An error spike on MinervaClientError's (12K an hour) up from the usual 3k. The problem seems to mostly effect US and Japanese users on en.wiki and jp.wiki. 1395 errors occurred on a single page in an hour period on Android Chrome Mobile but I couldn't replicate any issues even with the page and browser version available. Finally, I managed to replicate the problem: caching. Explanation here: https://phabricator.wikimedia.org/T208915#4958060. Shipped in 1.33.0-wmf.17, probably should have shipped in 18.

7th (1.33.0-wmf.16)

The regression https://phabricator.wikimedia.org/T217820 went live. No notable incident was recorded so it's likely impact was low.

January

edit

23rd

A patch was deployed Explicitly pass in parseHTML with the hope of dealing with many of the issues that appeared on 17th.

15th-17th

Grafana is missing some events (most notably ReadingDepth, VirtualPageViews), although they were recorded correctly in the EL databases.

This was due to a PDU issue that affected prometheus1003.

17th

MinervaClientError jumps again from 30 to 120k 2019-01-17 at 18:00:00 UTC(?). Not seeing anything obvious in https://wikitech.wikimedia.org/wiki/Server_Admin_Log or Deployment calendar. Stephen saw a problematic banner so this might also be related. 35% of errors come from iOS and 74% of traffic is on enwiki. The Steward nominations banner does appear to be throwing an error and when looking at referrer traffic for client side errors, the pages impacted do seem to coincide with places the banner is running. This was tracked and fixed but bugs were still at normal levels with the majority coming from iOS. Some of these bugs may be related to the page issues deploy so we are looking more closely...

9th

MinervaClient errors jump from 4k a minute to 40k a minute with the 1.33.0-wmf.12 deploy. Owch. It turned about to be due to QuickSurveys being disabled on English WIkipedia but some surveys still being active in cached HTML. We promptly pushed it back to normal levels.

2018

edit

December

edit

20th

Bug fix deployed: T211986

November

edit

5th

[MobileFrontend refactor bug]

Bug T208605 was squashed. Minerva.WebClientError returns to baseline.

October

edit

19th

[MobileFrontend refactor bug]

A suspected iOS Safari bug caused a huge error spike in number of errors in Minerva.WebClientError. (~30k to ~120k) Error in MediaWiki_1.33/wmf.1 (T208605).