Suggested Timeline
edit2 Months from Launch
edit- Gather prerequisites
- Plan in advance what to do with results and what actions to take.
1.5 Months from Launch
editBegin writing instrumentation
1 Month from Launch
editComplete A/B Test instrumentation
2 Weeks from Launch
editEnable a dummy A/B test earlier to test the mechanism separately from the actual test in Beta Cluster and Test Wiki (production).
1 Week from Launch
editDeploy on smaller language wikis and test
Launch Date
editDeploy on English wikipedia
Two Weeks after Launch
editTurn off A/B test
Prerequisites
edit- Get research objective and schema from data analyst, both of which can be found in the Phabricator ticket.
- Name of what is being tested: Identify the components that will be tested. In this example, we are A/B testing the Zebra skin, so the name is
. Keep this for later.skin-vector-zebra-experiment
- Variations: Identify the different versions of the component that you'll be testing. The variations in this example are
andvector-feature-zebra-design-enabled
.vector-feature-zebra-design-disabled
- Make sure the answer is "yes" for the following question: "Is the focus or the aim of the test on change management and phasing out features to users?"
Files to modify
editServer Side
edit- Modify the lines containing the A/B test configuration.
- In
, assign"VectorWebABTestEnrollment"
to the value decided on earlier ("name":
).skin-vector-zebra-experiment
"VectorWebABTestEnrollment": {
"value": {
"name": "skin-vector-zebra-experiment",
"enabled": false,
"buckets": {
"unsampled": {
"samplingRate": 0
},
"control": {
"samplingRate": 0.5
},
"treatment": {
"samplingRate": 0.5
}
}
},
"description": "An associative array of A/B test configs keyed by parameters noted in mediawiki.experiments.js. There must be an `unsampled` bucket that represents a population excluded from the experiment. Additionally, the treatment bucket(s) must include a case-insensitive `treatment` substring in their name (e.g. `treatment`, `stickyHeaderTreatment`, `sticky-header-treatment`)"
},
This file checks if a user is part of a specific A/B test experiment and determines whether they should be in the "control" group or the "test" group based on their user ID. Currently, it is hardcoded to divide users into two groups.
public function isMet(): bool {
// Get the experiment configuration from the config object.
$experiment = $this->config->get( 'VectorWebABTestEnrollment' );
// Use the local user ID directly
$id = $this->user->getId();
// Check if the experiment is not enabled or does not match the specified name.
if ( !$experiment['enabled'] || $experiment['name'] !== $this->experimentName ) {
// If the experiment is not enabled or does not match the specified name,
// return true, indicating that the metric is "met"
return true;
} else {
// If the experiment is enabled and matches the specified name,
// calculate the user's variant based on their user ID
$variant = $id % 2;
// Cast the variant value to a boolean and return it, indicating whether
// the user is in the "control" or "test" group.
return (bool)$variant;
}
}
ServiceWiring.php
This PHP file defines a set of service wirings for the Vector skin used in MediaWiki core. The purpose of these wirings is to manage different features and requirements for the Vector skin. The file includes:
- A main function that starts with a
return
statement, which indicates that this file returns an array of service definitions. - An array containing a single key-value pair, where the key is a constant (
Constants::SERVICE_FEATURE_MANAGER
) representing the service name, and the value is an anonymous function that creates and configures theFeatureManager
object. - Inside the anonymous function:
- A new instance of
FeatureManager
is created, which will manage the registration and evaluation of different features. - Several "requirements" are registered with the
FeatureManager
. These requirements define the conditions that must be met for a feature to be enabled for a particular user.
- A new instance of
Example: Zebra Design Feature
This feature depends on the Zebra AB test and whether the Zebra design configuration is enabled.
$featureManager->registerRequirement(
new ABRequirement(
$services->getMainConfig(),
$context->getUser(),
'skin-vector-zebra-experiment',
Constants::REQUIREMENT_ZEBRA_AB_TEST
)
);
⬇️
The following registers a feature named FEATURE_ZEBRA_DESIGN
with the FeatureManager
. To enable this feature, three requirements must be met: the skin must be fully initialized, the REQUIREMENT_ZEBRA_DESIGN
condition must be satisfied, and the REQUIREMENT_ZEBRA_AB_TEST
condition must also be fulfilled.
$featureManager->registerFeature(
Constants::FEATURE_ZEBRA_DESIGN,
[
Constants::REQUIREMENT_FULLY_INITIALISED,
Constants::REQUIREMENT_ZEBRA_DESIGN,
Constants::REQUIREMENT_ZEBRA_AB_TEST
]
);
The new feature (in this case the Zebra Design Feature) consumes ABRequirement as a requirement.
Zebra is enabled when:
- Zebra config is enabled
- Zebra AB Test config is disabled
- Zebra AB Test config is enabled (50% chance)
The FeatureManager
class in this file provides a way to manage features and requirements for the Vector skin. It allows for decoupling the logic of different components from their requirements, making the code more flexible and maintainable
The below method returns a list of CSS classes that should be added to the <body>
tag of the skin based on the enabled features. It iterates through the registered features and checks if each one is enabled or disabled. Based on the result, it generates CSS classes to be added to the body tag for styling purposes. In this case vector-feature-zebra-design-enabled
or vector-feature-zebra-design-disabled
public function getFeatureBodyClass() {
$featureManager = $this;
return array_map( static function ( $featureName ) use ( $featureManager ) {
// switch to lower case and switch from camel case to hyphens
$featureClass = ltrim( strtolower( preg_replace( '/[A-Z]([A-Z](?![a-z]))*/', '-$0', $featureName ) ), '-' );
$prefix = 'vector-feature-' . $featureClass . '-';
return $featureManager->isFeatureEnabled( $featureName ) ? $prefix . 'enabled' : $prefix . 'disabled';
}, array_keys( $this->features ) );
}
Client Side
editskin.js
The skin.js
file contains JavaScript code to initialize and manage various functionalities of the Vector skin, including language buttons, toggles, menus, search, animations, and A/B testing.
The script calls the init
function first to initialize the skin. Then, it checks if A/B tests are enabled and the user is not anonymous. If A/B tests are enabled for the user, it initializes A/B tests using the initExperiment
function with the configuration provided in ABTestConfig
.
initExperiment = require( './AB.js' ),
ABTestConfig = require( /** @type {string} */ ( './activeABTest.json' ) ),
⬇️
if ( ABTestConfig.enabled && !mw.user.isAnon() ) {
initExperiment( ABTestConfig, String( mw.user.getId() ) );
}
The ab.js
handles A/B testing functionality for web experiments. It exports a function called webABTest
, which is used to initialize and manage A/B tests.
Types and Definitions:
TreatmentBucketFunction
: A function that takes an optional string parameter and returns a boolean.WebABTest
: An object representing an A/B test with properties such as name and various functions for testing.SamplingRate
: An object representing the desired sampling rate for a group in the range [0, 1].WebABTestProps
: An object representing the properties needed to define an A/B test, such as experiment name, buckets, and token.
Function: webABTest
This function is the main entry point of the module. It takes the following parameters:
props
(WebABTestProps): An object containing the properties of the A/B test, such as experiment name, buckets, and token.token
(string): A unique token that identifies the subject (user) for the duration of the experiment.forceInit
(boolean, optional): A flag to force the initialization of the A/B test event. This is used for testing purposes and bypasses caching.
The function returns a WebABTest
object, which encapsulates the A/B test and provides various methods to check the subject's bucket, sample status, and treatment bucket assignment.
Bucketing Mechanism:
The webABTest
function uses a bucketing mechanism to assign users to different buckets based on the sampling rates defined in the props.buckets
object. The buckets represent different variations or treatments of the experiment.
If the bucketing has already occurred on the server-side (e.g., by adding a class to the body tag with the bucket name), the function retrieves the bucket from the DOM. Otherwise, it uses the provided token to bucket the subject on the client-side using mw.experiments.getBucket
function (see next file sample).
Methods of WebABTest
:
getBucket()
: Returns the name of the bucket the subject is assigned to for the A/B test.isInBucket(targetBucket)
: Checks if the subject is in a specific target bucket.isInSample()
: Determines if the subject is included in the A/B test (i.e., not excluded).isInTreatmentBucket(treatmentBucketName)
: Checks if the subject is in a treatment bucket based on a case-insensitive substring check in the bucket name.
Initialization and Hook:
The A/B test enrollment is logged using a hook (WEB_AB_TEST_ENROLLMENT_HOOK
) and sent to WikimediaEvents if the subject has been sampled into the experiment. Initialization occurs when the webABTest
function is called, and it can be forced using the forceInit
parameter for testing purposes.
- The module has a function called
webABTest
, which sets up an A/B test experiment. - The experiment has different "buckets" to assign users. Each bucket has a certain chance of being chosen.
- When a user enters the experiment, the system generates a "hash" based on the user's identity and the experiment's name. This hash determines which bucket the user is put into.
- The user is then shown the content or feature corresponding to their bucket.
- The experiment can be enabled or disabled, and if it's disabled, all users will be put in a default "control" bucket.
getBucket: function ( experiment, token ) {
var buckets = experiment.buckets,
key,
range = 0,
hash,
max,
acc = 0;
if ( !experiment.enabled || !Object.keys( experiment.buckets ).length ) {
return CONTROL_BUCKET;
}
for ( key in buckets ) {
range += buckets[ key ];
}
hash = hashString( experiment.name + ':' + token );
max = ( hash / MAX_INT32_UNSIGNED ) * range;
for ( key in buckets ) {
acc += buckets[ key ];
if ( max <= acc ) {
return key;
}
}
}
⬇️
modules/ext.wikimediaEvents/webABTestEnrollment.js
This file is part of the WikimediaEvents extension. It is used to log the enrollment of users into A/B tests.
logEvent
logs the A/B test initialization event with relevant data like the user's group, the experiment name, whether the user is anonymous, etc.
/**
* Log the A/B test initialization event.
*
* @param {Object} data event info for logging
*/
function logEvent( data ) {
/* eslint-disable camelcase */
const event = Object.assign( {}, webCommon(), {
$schema: '/analytics/mediawiki/web_ab_test_enrollment/2.0.0',
web_session_id: mw.user.sessionId(),
group: data.group,
experiment_name: data.experimentName,
is_anon: mw.user.isAnon()
} );
/* eslint-enable camelcase */
mw.eventLog.submit( 'mediawiki.web_ab_test_enrollment', event );
}
RIC
- On page load, it checks whether to log the A/B test initialization by waiting for the browser to be idle using
requestIdleCallback
. - When the A/B test enrollment data is available through a hook, it calls the
logEvent
function to log the relevant data.
LESS files
TO-DO: Fix the issues with the
class and use the stable .vector-body
class instead. Or explore a different approach to using the feature flag that reduces the risk of specificity-related bugs.
.mw-body-content
Other useful tools
editSet up
editWriting the variations
edit- Configure testing parameters in LocalSettings.php, such as the percentage of traffic that will see each version.
- Define an array for A/B testing in the Vector skin of MediaWiki.
To allocate 50% of users to the "control" bucket and 50% to the "treatment" bucket, use the following format.
$wgVectorWebABTestEnrollment = [ "value" => [ "name" => "your-experiment-name-here", "enabled" => false, "buckets" => [ "unsampled" => [ "samplingRate" => 0 ], "control" => [ "samplingRate" => 0.5 ], "treatment" => [ "samplingRate" => 0.5 ] ] ] ];
Launching test
editOnce you've set up your A/B test and determined your sample size, commit the patch containing the test as seen here.
Coordinate with the PM and CRS and ensure they are aware of the test schedule.
Most of the time, the integrity of the test means there won't be a public announcement ahead of time.
- Otherwise, the team can coordinate to ensure that the messages announcing the test have been posted.
- This would happen at least a week before the estimated time of launch – to give heads-up about a change of user experience, and make it possible for the communities to look for possible bugs in the user-generated code.
- Note that the quality of the configuration (wmf-config/InitialiseSettings.php) should be confirmed before these steps.
Test best practices
editPhase 1
editLimit configuration enabling to test.wikipedia.org and test2.wikipedia.org. Avoid launching the test on active content wikis without launching it on test wikis or closed content wikis first.
Analyst handoff
editAfter the test has run for a sufficient amount of time, the analyst will check the results to determine which variation performed better.
Next steps
edit- Once we have identified the winning variation and project manager approves, open a new patch to implement it. (Example forthcoming)
- Ensure that the changes are properly documented and communicated to relevant stakeholders.