Extension:WikibaseMediaInfo/Development

This extension is still under active development. This document contains some additional information that may be of interest to developers contributing to this project.

Setting up a Federated Development Environment in Vagrant

In production, WikibaseMediaInfo is used to enrich media files on Commons with data that lives elsewhere (Properties and Items from Wikidata). The process by which these two separate wikis communicate is known as Federation.

Setting up a similar relationship between two local wikis is not strictly required for WBMI development but in many situations it will be useful; such a system is also a closer approximation to the production environment.

Configure Vagrant

Mediawiki-Vagrant can be configured to set up a local "Wikidata" wiki and a local "Commons" wiki on separate URLs. Here's how to do it:

Follow the quick-start instructions for MediaWiki-Vagrant (install VirtualBox, install Vagrant, pull down the latest MediaWiki code with git clone --recursive https://gerrit.wikimedia.org/r/mediawiki/vagrant, and run the setup.sh script.

Run the following commands to update Vagrant's configuration:

# Allocate additional CPU and RAM resources
vagrant config vagrant_cores 4
vagrant config vagrant_ram 8192

# Optional: for potentially better MacOS performance
vagrant config nfs_shares yes

# Tell Vagrant to use a more recent version of Node
vagrant hiera npm::node_version 10 && vagrant provision

Enable the appropriate Vagrant roles. In this case, you'll want the following
```
vagrant roles enable mediainfo cirrussearch commons commonsmetadata kartographer mediainfo mediasearch mobilefrontend multimediaviewer parserfunctions scribunto uls uploadwizard wikibase_repo wikibasecirrussearch wikidata
```
On windows. Provision first. Then vagrant up.
vagrant roles enable mediainfo cirrussearch commons commonsmetadata kartographer mediainfo mobilefrontend multimediaviewer parserfunctions scribunto uls uploadwizard wikibase_repo wikibasecirrussearch wikidata --provision
(warning, can take a long time)

Spin up Vagrant and provision all roles: vagrant up --provision (warning, can take a long time)
On windows. Provision first. Then vagrant up.
vagrant up
Run `vagrant git-update` once that process completes

Setup Wikidata

Vagrant ssh then

$ sudo apt-get update && sudo apt-get upgrade
$ composer selfupdate --update-keys
$ composer config --global process-timeout 9600

Log out from Vagrant exit, then run:

$ vagrant roles enable wikidata
$ vagrant provision

Import Wikidata (optional)

If you'd prefer not to manually populate the local Wikidata wiki with properies and items for WBMI to use, you can use the WikibaseImport extension to import data from the command line.

Warning: typically when importing a given entity, many related entities will get imported along with it; if you don't set a very limited range you could end up with a process that runs for hours and ends up importing hundreds or thousands of items.

Clone the WikibaseImport repo into your extensions folder
Add wfLoadExtension( 'WikibaseImport' ); to your config file (in settings.d or LocalSettings).

Shell into Vagrant (vagrant ssh) and run the following commands:

# Run these inside of Vagrant ssh prompt
cd /vagrant/mediawiki/extensions/WikibaseImport
composer update

cd /vagrant/mediawiki
mwscript maintenance/update.php --wiki=wikidatawiki
mwscript maintenance/update.php --wiki=commonswiki

# Import "Depicts" first so that it is Property:P1 for convenience. Takes 1 hour +
mwscript extensions/WikibaseImport/maintenance/importEntities.php --wiki=wikidatawiki --entity P180

# Import some items if desired. Takes ~45 mins
mwscript extensions/WikibaseImport/maintenance/importEntities.php --wiki=wikidatawiki --range Q1:Q20

Post-installation maintenance scripts

Set Admin passwords

You may need to set a default admin password for some relevant wikis; I use Admin / vagrant for all local wikis here.

# to be run inside the vagrant ssh prompt
mwscript maintenance/changePassword.php --wiki=wikidatawiki --user=Admin --password=vagrant
mwscript maintenance/changePassword.php --wiki=commonswiki --user=Admin --password=vagrant

Update search index(es)

After setup and import of bulk Wikidata items, you may need to manually update various search indexes in both Commonswiki and Wikidatawiki.

vagrant ssh

# First update the wikidata search indexes;
# this enables property/item autocomplete in the structured data UI on the file pages
mwscript extensions/CirrusSearch/maintenance/UpdateOneSearchIndexConfig.php --wiki=wikidatawiki --startOver --indexType=general
mwscript extensions/CirrusSearch/maintenance/UpdateOneSearchIndexConfig.php --wiki=wikidatawiki --startOver --indexType=content
mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki=wikidatawiki

# Update the commonswiki search indexes to enable use of structured data in Special:MediaSearch
mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=commonswiki --reindexAndRemoveOk --indexIdentifier=now
mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki=commonswiki

Feature checklist

At this point, the following features should (hopefully) be working:

Adding structured data to file pages in commonswiki (including autocomplete suggestions that are populated by entities in the local wikidata instance)
Searching local files in commonswiki using Special:MediaSearch (searches should find matches based on both full text of file pages as well as any structured data that has been added)
Searches using the traditional search box on commonswiki should support haswbstatement:P1=Q1 style queries

Help, I just need to tweak one small UI thing!

If you don't need a fully federated setup, consider using MediaWiki-Docker and adding $wgMediaInfoLocalDev = true; – this will allow Special:MediaSearch to just query the production search API on Commons so you don't need to run queries locally.

Working with Wikibase

WikibaseMediaInfo sits on tops of Wikibase. If you're working with WikibaseMediaInfo you'll need to keep an eye on what's happening in Wikibase, because changes in Wikibase can affect you. For example Wikibase js config vars can come and go, and if they disappear they might catch you out.

Also watch out for conceptual differences between the two. As an example - in MediaInfo every File page has a corresponding MediaInfo item. Sometimes that item doesn't exist in the database, in which case it'll be a virtual item consisting only of an id. As far as Wikibase is concerned that item doesn't exist. This has tripped us up in the case of Wikibase's entityLoaded hook - it doesn't fire if there is no Wikibase entity in the db, and we need it to fire for a virtual item as well as a concrete one, so we can't use it.

Wikibase code is heavily abstracted and it can take some work to understand it and how to use it from WikibaseMediaInfo. Instantiating objects in particular can be a bit tricky - factories are wrapped inside callbacks that are in turn wrapped inside dispatching factories (factories that delegate object instantiation to other factories depending on their inputs). There's a WikibaseRepo service locator which you can access statically using WikibaseRepo::getDefaultInstance(), and to get utility classes like serializers or lookups you can usually find some kind of get*() method on the service locator that will give you what you need.

Here's an example - on the File page we want to be able to make the MediaInfo item associated with the page available to javascript. We do that by writing a serialized version of the item to a js config var inside the onBeforePageDisplay() hook in WikibaseMediaInfoHooks.php. Here's a simplified version of the code:

use Wikibase\MediaInfo\Services\MediaInfoServices;
use Wikibase\Repo\WikibaseRepo;

// Note: the OutputPage object $out is passed into the hook by default
// 
// Step 1: getting the entity from storage
$entityId = MediaInfoServices::getMediaInfoIdLookup()->getEntityIdForTitle( $out->getTitle() );
$entityLookup = WikibaseRepo::getDefaultInstance()->getEntityLookup(); // service locator
$entity = $entityLookup->getEntity( $entityId );

// Step 2: serializing the entity
$serializer = WikibaseRepo::getDefaultInstance()->getAllTypesEntitySerializer( $entityId ); // service locator
$serializedEntity = ( $entity ? $serializer->serialize( $entity ) : [] );

// Step 3: writing to js config var
$out->addJsConfigVars( [ 'wbEntity' => $serializedEntity ] );

The factories that the service locator ultimately uses are defined in WikibaseMediaInfo.entitytypes.php - for example the serializer used above is defined like this:

return [
	MediaInfo::ENTITY_TYPE => [
		...,
		'serializer-factory-callback' => function( SerializerFactory $serializerFactory ) {
			return new MediaInfoSerializer(
				$serializerFactory->newTermListSerializer(),
				$serializerFactory->newStatementListSerializer()
			);
		},
		...,
	]
];

Wikibase code munges all the serializer factories together into a dispatching factory, and then when you call getAllTypesEntitySerializer() from the service locator with a MediaInfo entity id it uses the callback defined above to return a MediaInfoSerializer.

Testing Strategy

WikibaseMediaInfo is a complicated extension, with complicated dependencies (i.e. Wikibase). Automated testing can play an important role in helping to manage this complexity.

To do this, we are using three different types of tests, which can be likened to levels in a "testing pyramid"^[1]. The three levels are: JS and PHP unit tests (the "base" of the pyramid), PHPUnit API/integration tests (the middle layer), and end-to-end tests in Selenium (the top of the pyramid).

JavaScript unit tests (headless Node/QUnit)

WikibaseMediaInfo introduces lots of new JS code, much of which is concerned with introducing new UI elements that enable users to view and edit structured data in various places (File pages, UploadWizard, Search, etc.). Wherever possible, we want to try and test these new JS components in isolation, using a headless Node.js testing framework instead of the traditional Special:JavaScriptTest approach. There is a good discussion around the advantages and reasoning behind this approach at this RFC on Phabricator.

Requirements

Node.js v10 is required to run these tests. QUnit is used as the testing framework. The JSDOM and Sinon libraries are also used extensively.

Writing Tests

For JS code in a Mediawiki extension to be testable this way, we need to be able to load it in an isolated context using Node's require statement. This means that the relevant part of the codebase needs to be re-written using ResourceLoader's new PackageFiles feature. Then the individual JS files used in this module must define a module.exports property (these files no longer need to be wrapped in self-executing functions). In addition to making code more testable, refactoring in this way lets us write JavaScript in a way that is more in line with the current practices of the wider JS community. This refactoring is currently in-progress (some modules in our extension.json use PackageFiles, while others still define an array of scripts).

Tests live in: tests/node-qunit and are organized into subfolders. Here is an example with a few simple tests for the LicenseDialogWidget, a basic UI component.

Having good coverage at the JS component level will help to catch regressions and make it easier to refactor code. Things to test for at this level include basic interactions (toggling a component in or out of edit state, for example), ensuring that appropriate API requests are sent when an action is taken, etc.

Running Tests

To run Node QUnit tests, open a terminal and run npm run test:unit. They are also included in the larger npm test script (which means they will run in CI).

PHP tests

PHPUnit tests are located in tests/phpunit. They must by run using MediaWiki core’s phpunit.php like this sudo -u www-data php phpunit.php --wiki wiki (in the vagrant dev environment phpunit.php is in /vagrant/mediawiki/tests/phpunit/).

Normal unit tests are in tests/phpunit/mediawiki/. Integration tests are in tests/phpunit/integration/.

End-to-end tests (Selenium)

End-to-end tests represent the highest level of the "testing pyramid". Tests at this level should focus on the "happy path" for a user. They can also be used to ensure that basic functionality (like logging in and editing a page) is never hampered by a regression.

Currently it is not feasible to run extension-specific Selenium tests for WikibaseMediaInfo in the regular CI process. Instead, tests can be run against Beta Commons on a regular schedule. These tests need to live in their own location ("specs_betacommons" instead of "specs") so that they are not picked up by the Selenium script run by Core (which does happen in the CI pipeline).

There is currently an in-progress patch that adds this functionality to add Selenium tests to this extension here. This document will be updated with more information about how to write and run these tests once that patch is merged.

↑ https://martinfowler.com/bliki/TestPyramid.html

[1] ttps://martinfowler.com/bliki/TestPyramid.html

[1]