Reading/Web/Release Manager updates/Logstash Instructions

Purpose of this document

To provide clear guidelines for Web team members on how to effectively monitor and report JavaScript errors in Logstash across different groups (Group 0, Group 1, and Group 2) based on deployment dates and affected sites.

Context

Every week, one team member is assigned to check Logstash for JavaScript errors. These errors are categorized into Group 0, Group 1, and Group 2, representing different sets of affected sites based on deployment dates.

Procedure

Understanding Groups and Deployment Dates

Note: The following is a typical timeline using real dates for the 1.42.0-wmf.20 release.

Group 0 (Tuesday): Represents the sites affected by the MediaWiki version deployed on 27th February 2024, including mediawiki.org, test.wikipedia.org, and test.wikidata.org.
Group 1 (Wednesday): Represents the sites affected by the MediaWiki version deployed on 28th February 2024, including Catalan Wikipedia, Hebrew Wikipedia, Italian Wikipedia, test2.wikipedia.org, and all non-Wikipedia sites (Wiktionary, Wikisource, Wikinews, Wikibooks, Wikiquote, Wikiversity, Wikivoyage, Wikidata, and others).
Group 2 (Thursday): Represents the sites affected by the MediaWiki version deployed on 29th February 2024, including all Wikipedias

Monitoring Approach

Team members should pay attention to errors in all groups but particularly focus on any significant changes or spikes in error rates on the specified days of the week.
Group 1 should be checked on Tuesday.
Any unusually high spike in errors in Group 1 should be immediately investigated and acted upon.
A spike in errors in Group 1 usually indicates a potential issue that might impact Group 2. However, it's essential to note that spikes in Group 1 can also indicate commons or wikidata specific bugs, requiring careful analysis.

Release Day Procedures

On Thursdays, code rolls out to Group 2, which provides an opportunity to assess whether an issue is unresolved before the deployment.

Other Chore Duty Guidelines

When on chore duty, team members should check errors in Group 1 every day for any red flags.

The goal is to identify and resolve issues before they reach Group 2, allowing for smoother deployments and minimizing disruptions.

Error Investigation Guiding Principles

Step 1

Unless the error is explicitly linked to a repository we maintain or shows unique properties/methods identifiable in our Codesearch or GlobalSearch, avoid extensive investigation.

Step 2

When investigating errors, utilize shorter yet unique error message snippets for search queries across multiple search engines and code repositories. Use this to validate that that the error is unrelated to any line of code we own.

Step 3

Once steps 1 and 2 are completed, then filter out the error in Logstash.

Key

For gadgets or user scripts it's acceptable to tag the filter with the name of the wiki page.
<product>:<phabricator ticket> e.g. VE:T348018 means there is an issue with VisualEditor and it is tracked in Phabricator ticket T348018
For unknown products label it "ExternalClientError:T..."
- If we are sure the error is outside our control e.g. browser extension, feel free to just tag it ExternalClientError.