Reading/Web/Release Manager updates/Logstash Instructions

Purpose of this document edit

To provide clear guidelines for Web team members on how to effectively monitor and report JavaScript errors in Logstash across different groups (Group 0, Group 1, and Group 2) based on deployment dates and affected sites.

Context edit

Every week, one team member is assigned to check Logstash for JavaScript errors. These errors are categorized into Group 0, Group 1, and Group 2, representing different sets of affected sites based on deployment dates.

Procedure edit

Understanding Groups and Deployment Dates edit

Note: The following is a typical timeline using real dates for the 1.42.0-wmf.20 release.

  • Group 0 (Tuesday): Represents the sites affected by the MediaWiki version deployed on 27th February 2024, including mediawiki.org, test.wikipedia.org, and test.wikidata.org.
  • Group 1 (Wednesday): Represents the sites affected by the MediaWiki version deployed on 28th February 2024, including Catalan Wikipedia, Hebrew Wikipedia, Italian Wikipedia, test2.wikipedia.org, and all non-Wikipedia sites (Wiktionary, Wikisource, Wikinews, Wikibooks, Wikiquote, Wikiversity, Wikivoyage, Wikidata, and others).
  • Group 2 (Thursday): Represents the sites affected by the MediaWiki version deployed on 29th February 2024, including all Wikipedias
Monitoring Approach edit
  • Team members should pay attention to errors in all groups but particularly focus on any significant changes or spikes in error rates on the specified days of the week.
  • Group 1 should be checked on Tuesday.
  • Any unusually high spike in errors in Group 1 should be immediately investigated and acted upon.
  • A spike in errors in Group 1 usually indicates a potential issue that might impact Group 2. However, it's essential to note that spikes in Group 1 can also indicate commons or wikidata specific bugs, requiring careful analysis.
Release Day Procedures edit
  • On Thursdays, code rolls out to Group 2, which provides an opportunity to assess whether an issue is unresolved before the deployment.
Other Chore Duty Guidelines edit
  • When on chore duty, team members should check errors in Group 1 every day for any red flags.

The goal is to identify and resolve issues before they reach Group 2, allowing for smoother deployments and minimizing disruptions.

Error Investigation Guiding Principles edit

Step 1 edit

Unless the error is explicitly linked to a repository we maintain or shows unique properties/methods identifiable in our Codesearch or GlobalSearch, avoid extensive investigation.

Step 2 edit

When investigating errors, utilize shorter yet unique error message snippets for search queries across multiple search engines and code repositories. Use this to validate that that the error is unrelated to any line of code we own.

Step 3 edit

Once steps 1 and 2 are completed, then filter out the error in Logstash and label it "Non-InternalClientError: <Insert Error Name>"