Readers/Web/Instrumentation Overview

Suggested Timeline

edit

2 Months from Launch

edit
  • Gather prerequisites
  • Plan in advance what to do with results and what actions to take.

1.5 Months from Launch

edit

Begin writing instrumentation

1 Month from Launch

edit

Complete A/B Test instrumentation

2 Weeks from Launch

edit

Enable a dummy A/B test earlier to test the mechanism separately from the actual test in Beta Cluster and Test Wiki (production).

1 Week from Launch

edit

Deploy on smaller language wikis and test

Launch Date

edit

Deploy on English wikipedia

Two Weeks after Launch

edit

Turn off A/B test

Prerequisites

edit
  • Get research objective and schema from data analyst, both of which can be found in the Phabricator ticket.
  • Name of what is being tested: Identify the components that will be tested. In this example, we are A/B testing the Zebra skin, so the name is skin-vector-zebra-experiment. Keep this for later.
  • Variations: Identify the different versions of the component that you'll be testing. The variations in this example are vector-feature-zebra-design-enabled and vector-feature-zebra-design-disabled.
  • Make sure the answer is "yes" for the following question: "Is the focus or the aim of the test on change management and phasing out features to users?"

Files to modify

edit

Server Side

edit

skin.json

  1. Modify the lines containing the A/B test configuration.
  2. In "VectorWebABTestEnrollment", assign "name": to the value decided on earlier (skin-vector-zebra-experiment).
		"VectorWebABTestEnrollment": {
			"value": {
				"name": "skin-vector-zebra-experiment",
				"enabled": false,
				"buckets": {
					"unsampled": {
						"samplingRate": 0
					},
					"control": {
						"samplingRate": 0.5
					},
					"treatment": {
						"samplingRate": 0.5
					}
				}
			},
			"description": "An associative array of A/B test configs keyed by parameters noted in mediawiki.experiments.js. There must be an `unsampled` bucket that represents a population excluded from the experiment. Additionally, the treatment bucket(s) must include a case-insensitive `treatment` substring in their name (e.g. `treatment`, `stickyHeaderTreatment`, `sticky-header-treatment`)"
		},


ABRequirement.php

This file checks if a user is part of a specific A/B test experiment and determines whether they should be in the "control" group or the "test" group based on their user ID. Currently, it is hardcoded to divide users into two groups.

	public function isMet(): bool {
		// Get the experiment configuration from the config object.
		$experiment = $this->config->get( 'VectorWebABTestEnrollment' );

		// Use the local user ID directly
		$id = $this->user->getId();

		// Check if the experiment is not enabled or does not match the specified name.
		if ( !$experiment['enabled'] || $experiment['name'] !== $this->experimentName ) {
			// If the experiment is not enabled or does not match the specified name,
			// return true, indicating that the metric is "met"
			return true;
		} else {
			// If the experiment is enabled and matches the specified name,
			// calculate the user's variant based on their user ID
			$variant = $id % 2;

			// Cast the variant value to a boolean and return it, indicating whether
			// the user is in the "control" or "test" group.
			return (bool)$variant;
		}
	}


ServiceWiring.php This PHP file defines a set of service wirings for the Vector skin used in MediaWiki core. The purpose of these wirings is to manage different features and requirements for the Vector skin. The file includes:

  1. A main function that starts with a return statement, which indicates that this file returns an array of service definitions.
  2. An array containing a single key-value pair, where the key is a constant (Constants::SERVICE_FEATURE_MANAGER) representing the service name, and the value is an anonymous function that creates and configures the FeatureManager object.
  3. Inside the anonymous function:
    • A new instance of FeatureManager is created, which will manage the registration and evaluation of different features.
    • Several "requirements" are registered with the FeatureManager. These requirements define the conditions that must be met for a feature to be enabled for a particular user.

Example: Zebra Design Feature

This feature depends on the Zebra AB test and whether the Zebra design configuration is enabled.

		$featureManager->registerRequirement(
			new ABRequirement(
				$services->getMainConfig(),
				$context->getUser(),
				'skin-vector-zebra-experiment',
				Constants::REQUIREMENT_ZEBRA_AB_TEST
			)
		);

⬇️

The following registers a feature named FEATURE_ZEBRA_DESIGN with the FeatureManager. To enable this feature, three requirements must be met: the skin must be fully initialized, the REQUIREMENT_ZEBRA_DESIGN condition must be satisfied, and the REQUIREMENT_ZEBRA_AB_TEST condition must also be fulfilled.

		$featureManager->registerFeature(
			Constants::FEATURE_ZEBRA_DESIGN,
			[
				Constants::REQUIREMENT_FULLY_INITIALISED,
				Constants::REQUIREMENT_ZEBRA_DESIGN,
				Constants::REQUIREMENT_ZEBRA_AB_TEST
			]
		);

The new feature (in this case the Zebra Design Feature) consumes ABRequirement as a requirement.

Zebra is enabled when:

  1. Zebra config is enabled
  2. Zebra AB Test config is disabled
  3. Zebra AB Test config is enabled (50% chance)


FeatureManager.php

The FeatureManager class in this file provides a way to manage features and requirements for the Vector skin. It allows for decoupling the logic of different components from their requirements, making the code more flexible and maintainable


The below method returns a list of CSS classes that should be added to the <body> tag of the skin based on the enabled features. It iterates through the registered features and checks if each one is enabled or disabled. Based on the result, it generates CSS classes to be added to the body tag for styling purposes. In this case vector-feature-zebra-design-enabled or vector-feature-zebra-design-disabled

	public function getFeatureBodyClass() {
		$featureManager = $this;
		return array_map( static function ( $featureName ) use ( $featureManager ) {
			// switch to lower case and switch from camel case to hyphens
			$featureClass = ltrim( strtolower( preg_replace( '/[A-Z]([A-Z](?![a-z]))*/', '-$0', $featureName ) ), '-' );
			$prefix = 'vector-feature-' . $featureClass . '-';
			return $featureManager->isFeatureEnabled( $featureName ) ? $prefix . 'enabled' : $prefix . 'disabled';
		}, array_keys( $this->features ) );
	}


Client Side

edit

skin.js

The skin.js file contains JavaScript code to initialize and manage various functionalities of the Vector skin, including language buttons, toggles, menus, search, animations, and A/B testing.

The script calls the init function first to initialize the skin. Then, it checks if A/B tests are enabled and the user is not anonymous. If A/B tests are enabled for the user, it initializes A/B tests using the initExperiment function with the configuration provided in ABTestConfig.

initExperiment = require( './AB.js' ),
ABTestConfig = require( /** @type {string} */ ( './activeABTest.json' ) ),

⬇️

if ( ABTestConfig.enabled && !mw.user.isAnon() ) {
	initExperiment( ABTestConfig, String( mw.user.getId() ) );
}

AB.js

The ab.js handles A/B testing functionality for web experiments. It exports a function called webABTest, which is used to initialize and manage A/B tests.

Types and Definitions:

  1. TreatmentBucketFunction: A function that takes an optional string parameter and returns a boolean.
  2. WebABTest: An object representing an A/B test with properties such as name and various functions for testing.
  3. SamplingRate: An object representing the desired sampling rate for a group in the range [0, 1].
  4. WebABTestProps: An object representing the properties needed to define an A/B test, such as experiment name, buckets, and token.


Function: webABTest

This function is the main entry point of the module. It takes the following parameters:

  1. props (WebABTestProps): An object containing the properties of the A/B test, such as experiment name, buckets, and token.
  2. token (string): A unique token that identifies the subject (user) for the duration of the experiment.
  3. forceInit (boolean, optional): A flag to force the initialization of the A/B test event. This is used for testing purposes and bypasses caching.

The function returns a WebABTest object, which encapsulates the A/B test and provides various methods to check the subject's bucket, sample status, and treatment bucket assignment.


Bucketing Mechanism:

The webABTest function uses a bucketing mechanism to assign users to different buckets based on the sampling rates defined in the props.buckets object. The buckets represent different variations or treatments of the experiment.

If the bucketing has already occurred on the server-side (e.g., by adding a class to the body tag with the bucket name), the function retrieves the bucket from the DOM. Otherwise, it uses the provided token to bucket the subject on the client-side using mw.experiments.getBucket function (see next file sample).


Methods of WebABTest:

  1. getBucket(): Returns the name of the bucket the subject is assigned to for the A/B test.
  2. isInBucket(targetBucket): Checks if the subject is in a specific target bucket.
  3. isInSample(): Determines if the subject is included in the A/B test (i.e., not excluded).
  4. isInTreatmentBucket(treatmentBucketName): Checks if the subject is in a treatment bucket based on a case-insensitive substring check in the bucket name.


Initialization and Hook:

The A/B test enrollment is logged using a hook (WEB_AB_TEST_ENROLLMENT_HOOK) and sent to WikimediaEvents if the subject has been sampled into the experiment. Initialization occurs when the webABTest function is called, and it can be forced using the forceInit parameter for testing purposes.

core/mediawiki.experiments.js

  1. The module has a function called webABTest, which sets up an A/B test experiment.
  2. The experiment has different "buckets" to assign users. Each bucket has a certain chance of being chosen.
  3. When a user enters the experiment, the system generates a "hash" based on the user's identity and the experiment's name. This hash determines which bucket the user is put into.
  4. The user is then shown the content or feature corresponding to their bucket.
  5. The experiment can be enabled or disabled, and if it's disabled, all users will be put in a default "control" bucket.
getBucket: function ( experiment, token ) {
			var buckets = experiment.buckets,
				key,
				range = 0,
				hash,
				max,
				acc = 0;

			if ( !experiment.enabled || !Object.keys( experiment.buckets ).length ) {
				return CONTROL_BUCKET;
			}

			for ( key in buckets ) {
				range += buckets[ key ];
			}

			hash = hashString( experiment.name + ':' + token );
			max = ( hash / MAX_INT32_UNSIGNED ) * range;

			for ( key in buckets ) {
				acc += buckets[ key ];

				if ( max <= acc ) {
					return key;
				}
			}
		}

⬇️

modules/ext.wikimediaEvents/webABTestEnrollment.js

This file is part of the WikimediaEvents extension. It is used to log the enrollment of users into A/B tests.

logEvent logs the A/B test initialization event with relevant data like the user's group, the experiment name, whether the user is anonymous, etc.

/**
 * Log the A/B test initialization event.
 *
 * @param {Object} data event info for logging
 */
function logEvent( data ) {
	/* eslint-disable camelcase */
	const event = Object.assign( {}, webCommon(), {
		$schema: '/analytics/mediawiki/web_ab_test_enrollment/2.0.0',
		web_session_id: mw.user.sessionId(),
		group: data.group,
		experiment_name: data.experimentName,
		is_anon: mw.user.isAnon()
	} );
	/* eslint-enable camelcase */

	mw.eventLog.submit( 'mediawiki.web_ab_test_enrollment', event );
}

RIC

  • On page load, it checks whether to log the A/B test initialization by waiting for the browser to be idle using requestIdleCallback.
  • When the A/B test enrollment data is available through a hook, it calls the logEvent function to log the relevant data.


LESS files

TO-DO: Fix the issues with the .vector-body class and use the stable .mw-body-content class instead. Or explore a different approach to using the feature flag that reduces the risk of specificity-related bugs.

Other useful tools

edit
  1. Hue - Use this to query events. (Remember: Data is limited to 90 days)
  2. DataHub Historically, the team has been using Google Spreadsheets for schema tracking, but we are currently transitioning to referencing and recording schemas here.

Set up

edit

Writing the variations

edit
  1. Configure testing parameters in LocalSettings.php, such as the percentage of traffic that will see each version.
  2. Define an array for A/B testing in the Vector skin of MediaWiki.

To allocate 50% of users to the "control" bucket and 50% to the "treatment" bucket, use the following format.

Example:

Launching test

edit

Once you've set up your A/B test and determined your sample size, commit the patch containing the test as seen here.

Coordinate with the PM and CRS and ensure they are aware of the test schedule.

Most of the time, the integrity of the test means there won't be a public announcement ahead of time.

  • Otherwise, the team can coordinate to ensure that the messages announcing the test have been posted.
  • This would happen at least a week before the estimated time of launch – to give heads-up about a change of user experience, and make it possible for the communities to look for possible bugs in the user-generated code.
  • Note that the quality of the configuration (wmf-config/InitialiseSettings.php) should be confirmed before these steps.

Test best practices

edit

Phase 1

edit

Limit configuration enabling to test.wikipedia.org and test2.wikipedia.org. Avoid launching the test on active content wikis without launching it on test wikis or closed content wikis first.

Analyst handoff

edit

After the test has run for a sufficient amount of time, the analyst will check the results to determine which variation performed better.

Next steps

edit
  • Once we have identified the winning variation and project manager approves, open a new patch to implement it. (Example forthcoming)
  • Ensure that the changes are properly documented and communicated to relevant stakeholders.