Extension:WikibaseMediaInfo/Development

This extension is still under active development. This document contains some additional information that may be of interest to developers contributing to this project.

Setting up a Federated Development Environment in VagrantEdit

In production, WikibaseMediaInfo is used to enrich media files on Commons with data that lives elsewhere (Properties and Items from Wikidata). The process by which these two separate wikis communicate is known as Federation.

Setting up a similar relationship between two local wikis is not strictly required for WBMI development but in many situations it will be useful; such a system is also a closer approximation to the production environment.

Configure VagrantEdit

Mediawiki-Vagrant can be configured to set up a local "Wikidata" wiki and a local "Commons" wiki on separate URLs. Here's how to do it:

  1. Follow the quick-start instructions for MediaWiki-Vagrant (install VirtualBox, install Vagrant, pull down the latest MediaWiki code with git clone --recursive https://gerrit.wikimedia.org/r/mediawiki/vagrant, and run the setup.sh script.
  2. Run the following commands to update Vagrant's configuration:
    # Allocate additional CPU and RAM resources
    vagrant config vagrant_cores 4
    vagrant config vagrant_ram 8192
    
    # Optional: for potentially better MacOS performance
    vagrant config nfs_shares yes
    
    # Tell Vagrant to use a more recent version of Node
    vagrant hiera npm::node_version 10 && vagrant provision
    
  3. Enable the appropriate Vagrant roles. In this case, you'll want the following
    vagrant roles enable cirrussearch commons commonsmetadata kartographer mediainfo mobilefrontend multimediaviewer parserfunctions scribunto uls uploadwizard wikibase_repo wikibasecirrussearch wikidata
    
    On windows. Provision first. Then vagrant up.
    vagrant roles enable cirrussearch commons commonsmetadata kartographer mediainfo mobilefrontend multimediaviewer parserfunctions scribunto uls uploadwizard wikibase_repo wikibasecirrussearch wikidata --provision
    
    (warning, can take a long time)
  1. Spin up Vagrant and provision all roles: vagrant up --provision (warning, can take a long time)
    On windows. Provision first. Then vagrant up.
    vagrant up
    
  2. Run `vagrant git-update` once that process completes

Configure WikisEdit

Once the Vagrant environment is ready, you'll need to add wiki-specific configuration (certain settings need to be enabled only for certain wikis). We are primarily concerned with Wikidatawiki (http://wikidata.wiki.local.wmftest.net:8080) and Commonswiki (http://commons.wiki.local.wmftest.net:8080/wiki).

Per the Mediawiki-Vagrant recommendations , config should be placed in a dedicated file inside of the settings.d directory; see the README file in that directory for more details. Note: files created directly inside settings.d are safe, but anything placed inside the sub-directories is expected to be managed by Puppet, meaning they will be erased if you run vagrant destroy later.

Configuration files inside settings.d should be prefixed with a 2-digit numerical code that determines the order they are to be run in. In this case, we want a collection of WBMI-specific settings that will be applied after everything else, so a name like 99-wbmi.php would make sense.

The file below assumes that the ID of your local "depicts" property is P1; if not, replace with the appropriate property ID

<?php
// 99-wbmi.php
//
// Setup for the WikibaseMediaInfo extension and related structured data features;
// "commonswiki" is used for file hosting while "wikidatawiki" is used for WB entities

// Global settings ============================================================
$wgWBCSUseCirrus = true;
$wgCrossSiteAJAXdomains = [ '*' ];
$wgSearchMatchRedirectPreference = true;
$wgDefaultUserOptions['search-match-redirect'] = false;

// Commonswiki ================================================================
if ( $wgDBname === 'commonswiki' ) {

	// Wikibase
	$wgEnableWikibaseRepo = true;
	$wgEnableWikibaseClient = true;

	if ( \ExtensionRegistry::getInstance()->isLoaded( 'Wikibase' ) == false ) {
		require_once "$IP/extensions/Wikibase/repo/Wikibase.php";
	}

	// Cirrus
	$wgNamespacesToBeSearchedDefault = [
		6 => 1,
		12 => 1,
		14 => 1,
		100 => 1,
		106 => 1,
		0 => 1,
		1 => 0,
		2 => 0,
		3 => 0,
		4 => 0,
		5 => 0,
		107 => 0,
		7 => 0,
		8 => 0,
		9 => 0,
		10 => 0,
		11 => 0,
		108 => 0,
		13 => 0
	];
	$wgCirrusSearchNamespaceWeightOverrides = [ 6 => 1.0 ];
	$wgCirrusSearchRescoreProfile = 'classic_noboostlinks';

	// Federation
	$wgWBRepoSettings['useEntitySourceBasedFederation'] = true;
	$entitySources = [
		'wikidata' => [
			'entityNamespaces' => [
				'item' => 0,
				'property' => 122,
				'lexeme' => 146,
			],
			'repoDatabase' => 'wikidatawiki',
			'baseUri' => 'http://wikidata.wiki.local.wmftest.net:8080/wiki/Special:EntityData/',
			'rdfNodeNamespacePrefix' => 'wd',
			'rdfPredicateNamespacePrefix' => '',
			'interwikiPrefix' => 'd',
		],
		'commons' => [
			'entityNamespaces' => [
				'mediainfo' => '6/mediainfo',
			],
			'repoDatabase' => 'commonswiki',
			'baseUri' => 'http://commons.wiki.local.wmftest.net:8080/entity/',
			'rdfNodeNamespacePrefix' => 'sdc',
			'rdfPredicateNamespacePrefix' => 'sdc',
			'interwikiPrefix' => 'c',
		],
		'local' => [
			'entityNamespaces' => [],
			'repoDatabase' => 'commonswiki',
			'baseUri' => 'http://wikidata.wiki.local.wmftest.net:8080/wiki/Special:EntityData/',
			'rdfNodeNamespacePrefix' => 'sdc',
			'rdfPredicateNamespacePrefix' => 'sdc',
			'interwikiPrefix' => 'd',
		],
	];
	$wgWBRepoSettings['entitySources'] = $entitySources;
	$wgWBRepoSettings['localEntitySourceName'] = 'commons';
	$wgWBClientSettings['entitySources'] = $entitySources;
	$wgWBClientSettings['localEntitySourceName'] = 'commons';

	// Other Wikibase settings
	$wgWBRepoSettings['conceptBaseUri'] = 'http://commons.wiki.local.wmftest.net:8080/wiki/Special:EntityData/';
	$wgWBRepoSettings['useTermsTableSearchFields'] = false;
	$wgWBRepoSettings['searchIndexTypes'] = [
		'string',
		'external-id',
		'wikibase-item',
		'wikibase-property',
		'globe-coordinate',
		'wikibase-lexeme',
		'wikibase-form',
		'wikibase-sense'
	];
	$wgWBRepoSettings['sparqlEndpoint'] = 'https://query.wikidata.org/sparql';

	// MediaInfo
	wfLoadExtension( 'WikibaseMediaInfo' );
	$wgMediaInfoProperties = [ 'depicts' => 'P1' ];
	$wgMediaInfoEnableSearch = true;
	$wgMediaInfoSearchProperties = [ 'P1' => 1 ];
	$wgMediaInfoExternalEntitySearchBaseUri = 'http://wikidata.wiki.local.wmftest.net:8080/w/api.php';
	$wgMediaInfoMediaSearchHasLtrPlugin = true;
}

Setup WikidataEdit

$ sudo apt-get update && sudo apt-get upgrade
$ composer selfupdate --update-keys
$ composer config --global process-timeout 9600

Log out from Vagrant, then run:

$ vagrant roles enable wikidata
$ vagrant provision

Import Wikidata (optional)Edit

If you'd prefer not to manually populate the local Wikidata wiki with properies and items for WBMI to use, you can use the WikibaseImport extension to import data from the command line.

Warning: typically when importing a given entity, many related entities will get imported along with it; if you don't set a very limited range you could end up with a process that runs for hours and ends up importing hundreds or thousands of items.

  1. Clone the WikibaseImport repo into your extensions folder
  2. Add wfLoadExtension( 'WikibaseImport' ); to your config file (in settings.d or LocalSettings).
  3. Shell into Vagrant (vagrant ssh) and run the following commands:
    # Run these inside of Vagrant ssh prompt
    cd /vagrant/mediawiki/extensions/WikibaseImport
    composer update
    
    cd /vagrant/mediawiki
    mwscript maintenance/update.php --wiki=wikidatawiki
    mwscript maintenance/update.php --wiki=commonswiki
    
    # Import "Depicts" first so that it is Property:P1 for convenience. Takes 1 hour +
    mwscript extensions/WikibaseImport/maintenance/importEntities.php --wiki=wikidatawiki --entity P180
    
    # Import some items if desired. Takes ~45 mins
    mwscript extensions/WikibaseImport/maintenance/importEntities.php --wiki=wikidatawiki --range Q1:Q20
    

Post-installation maintenance scriptsEdit

Set Admin passwordsEdit

You may need to set a default admin password for some relevant wikis; I use Admin / vagrant for all local wikis here.

# to be run inside the vagrant ssh prompt
mwscript maintenance/changePassword.php --wiki=wikidatawiki --user=Admin --password=vagrant
mwscript maintenance/changePassword.php --wiki=commonswiki --user=Admin --password=vagrant

Update search index(es)Edit

After setup and import of bulk Wikidata items, you may need to manually update various search indexes in both Commonswiki and Wikidatawiki.

vagrant ssh

# First update the wikidata search indexes;
# this enables property/item autocomplete in the structured data UI on the file pages
mwscript extensions/CirrusSearch/maintenance/UpdateOneSearchIndexConfig.php --wiki=wikidatawiki --startOver --indexType=general
mwscript extensions/CirrusSearch/maintenance/UpdateOneSearchIndexConfig.php --wiki=wikidatawiki --startOver --indexType=content
mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki=wikidatawiki

# Update the commonswiki search indexes to enable use of structured data in Special:MediaSearch
mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=commonswiki --reindexAndRemoveOk --indexIdentifier=now
mwscript extensions/CirrusSearch/maintenance/ForceSearchIndex.php --wiki=commonswiki

Feature checklistEdit

At this point, the following features should (hopefully) be working:

  • Adding structured data to file pages in commonswiki (including autocomplete suggestions that are populated by entities in the local wikidata instance)
  • Searching local files in commonswiki using Special:MediaSearch (searches should find matches based on both full text of file pages as well as any structured data that has been added)
  • Searches using the traditional search box on commonswiki should support haswbstatement:P1=Q1 style queries

Concept ChipsEdit

Additional configuration will be needed in order to use "concept chips" (suggested related search terms) in local development. First, ensure you have the follow settings in your LocalSettings or settings.d file in Vagrant:

$wgMediaInfoLocalDev = true;
$wgMediaInfoProperties = [ 'depicts' => 'P1' ];
$wgMediaInfoEnableSearch = true;
$wgMediaInfoSearchProperties = [ 'P1' => 1 ];

$wgMediaInfoExternalEntitySearchBaseUri = 'http://wikidata.wiki.local.wmftest.net:8080/w/api.php';
// $wgMediaInfoExternalEntitySearchBaseUri = 'https://www.wikidata.org/w/api.php'; // production Wikidata for real results (but they won't be formatted properly)

Next, add the following heuristics rules (or a local adaptation thereof, with your own properties & items):

$wgMediaSearchConceptChipsSimpleHeuristics = [ [
	'must not' => [ [
		// instance of Wikipedia disambiguation pages
		'property' => 'P31',
		'item' => 'Q4167410',
	] ],
	'conditions' => [
		[
			'should' => [
				[
					// instance of tourist attraction
					'property' => 'P31',
					'item' => 'Q570116',
				], [
					// instance of landmark
					'property' => 'P31',
					'item' => 'Q2319498',
				], [
					// instance of mountain range
					'property' => 'P31',
					'item' => 'Q46831',
				]
			],
			'result' => [ 'P276', 'P131', 'P17' ],
		],
		[
			'must' => [ [
				// owned by
				'property' => 'P127',
			] ],
			'result' => [ 'P127' ],
		],
		[
			'must' => [ [
				// notable work
				'property' => 'P800',
			] ],
			'result' => [ 'P800' ],
		],
		[
			'must' => [ [
				// creator
				'property' => 'P170',
			] ],
			'result' => [ 'P170' ],
		],
		[
			'must' => [ [
				// has effect
				'property' => 'P1542',
			] ],
			'result' => [ 'P1542' ],
		],
		[
			'should' => [
				[
					// instance of fictional character
					'property' => 'P31',
					'item' => 'Q95074',
				], [
					// instance of literary character
					'property' => 'P31',
					'item' => 'Q3658341',
				], [
					// instance of film character
					'property' => 'P31',
					'item' => 'Q15773347',
				], [
					// instance of television character
					'property' => 'P31',
					'item' => 'Q15773317',
				], [
					// instance of animated character
					'property' => 'P31',
					'item' => 'Q15711870',
				], [
					// instance of theatrical character
					'property' => 'P31',
					'item' => 'Q3375722',
				]
			],
			'result' => [
				// performer
				'P175',
				// voice actor
				'P725',
				// present in work
				'P1441',
			],
		],
		[
			'must' => [ [
				// parent astronomical body
				'property' => 'P397',
			] ],
			'result' => [ 'P397' ],
		],
		[
			'must' => [ [
				// said to be the same as
				'property' => 'P460',
			] ],
			'result' => [ 'P460' ],
		],
		[
			'must' => [ [
				// member of sports team
				'property' => 'P54',
			] ],
			'result' => [ 'P54' ],
		],
		[
			'must' => [ [
				// member of
				'property' => 'P463',
			] ],
			'result' => [ 'P463' ],
		],
		[
			'must' => [ [
				// part of
				'property' => 'P361',
			] ],
			'result' => [ 'P361' ],
		],
		[
			'must' => [ [
				// has part
				'property' => 'P527',
			] ],
			'must not' => [
				[
					// not instance of rapid transit ('has part' has line letters & numbers...)
					'property' => 'P31',
					'item' => 'Q5503',
				], [
					// not instance of urbain rail transit
					'property' => 'P31',
					'item' => 'Q3491904',
				], [
					// not instance of railway network
					'property' => 'P31',
					'item' => 'Q2678338',
				], [
					// not instance of public transport network
					'property' => 'P31',
					'item' => 'Q18325841',
				],
			],
			'result' => [ 'P527' ],
		],
		[
			'result' => [
				// subclass of entities
				'P279',
			]
		],
	] ],
];

Finally, make sure you set ?conceptchips=true in your URL parameters!

Help, I just need to tweak one small UI thing!Edit

If you don't need a fully federated setup, consider using MediaWiki-Docker and adding $wgMediaInfoLocalDev = true; – this will allow Special:MediaSearch to just query the production search API on Commons so you don't need to run queries locally.

Working with WikibaseEdit

WikibaseMediaInfo sits on tops of Wikibase. If you're working with WikibaseMediaInfo you'll need to keep an eye on what's happening in Wikibase, because changes in Wikibase can affect you. For example Wikibase js config vars can come and go, and if they disappear they might catch you out.

Also watch out for conceptual differences between the two. As an example - in MediaInfo every File page has a corresponding MediaInfo item. Sometimes that item doesn't exist in the database, in which case it'll be a virtual item consisting only of an id. As far as Wikibase is concerned that item doesn't exist. This has tripped us up in the case of Wikibase's entityLoaded hook - it doesn't fire if there is no Wikibase entity in the db, and we need it to fire for a virtual item as well as a concrete one, so we can't use it.

Wikibase code is heavily abstracted and it can take some work to understand it and how to use it from WikibaseMediaInfo. Instantiating objects in particular can be a bit tricky - factories are wrapped inside callbacks that are in turn wrapped inside dispatching factories (factories that delegate object instantiation to other factories depending on their inputs). There's a WikibaseRepo service locator which you can access statically using WikibaseRepo::getDefaultInstance(), and to get utility classes like serializers or lookups you can usually find some kind of get*() method on the service locator that will give you what you need.

Here's an example - on the File page we want to be able to make the MediaInfo item associated with the page available to javascript. We do that by writing a serialized version of the item to a js config var inside the onBeforePageDisplay() hook in WikibaseMediaInfoHooks.php. Here's a simplified version of the code:

use Wikibase\MediaInfo\Services\MediaInfoServices;
use Wikibase\Repo\WikibaseRepo;

// Note: the OutputPage object $out is passed into the hook by default
// 
// Step 1: getting the entity from storage
$entityId = MediaInfoServices::getMediaInfoIdLookup()->getEntityIdForTitle( $out->getTitle() );
$entityLookup = WikibaseRepo::getDefaultInstance()->getEntityLookup(); // service locator
$entity = $entityLookup->getEntity( $entityId );

// Step 2: serializing the entity
$serializer = WikibaseRepo::getDefaultInstance()->getAllTypesEntitySerializer( $entityId ); // service locator
$serializedEntity = ( $entity ? $serializer->serialize( $entity ) : [] );

// Step 3: writing to js config var
$out->addJsConfigVars( [ 'wbEntity' => $serializedEntity ] );

The factories that the service locator ultimately uses are defined in WikibaseMediaInfo.entitytypes.php - for example the serializer used above is defined like this:

return [
	MediaInfo::ENTITY_TYPE => [
		...,
		'serializer-factory-callback' => function( SerializerFactory $serializerFactory ) {
			return new MediaInfoSerializer(
				$serializerFactory->newTermListSerializer(),
				$serializerFactory->newStatementListSerializer()
			);
		},
		...,
	]
];

Wikibase code munges all the serializer factories together into a dispatching factory, and then when you call getAllTypesEntitySerializer() from the service locator with a MediaInfo entity id it uses the callback defined above to return a MediaInfoSerializer.

Testing StrategyEdit

WikibaseMediaInfo is a complicated extension, with complicated dependencies (i.e. Wikibase). Automated testing can play an important role in helping to manage this complexity.

To do this, we are using three different types of tests, which can be likened to levels in a "testing pyramid"[1]. The three levels are: JS and PHP unit tests (the "base" of the pyramid), PHPUnit API/integration tests (the middle layer), and end-to-end tests in Selenium (the top of the pyramid).

Javascript unit tests (headless Node/QUnit)Edit

WikibaseMediaInfo introduces lots of new JS code, much of which is concerned with introducing new UI elements that enable users to view and edit structured data in various places (File pages, UploadWizard, Search, etc.). Wherever possible, we want to try and test these new JS components in isolation, using a headless Node.js testing framework instead of the traditional Special:JavascriptTest approach. There is a good discussion around the advantages and reasoning behind this approach at this RFC on Phabricator.

RequirementsEdit

Node.js v10 is required to run these tests. QUnit is used as the testing framework. The JSDOM and Sinon libraries are also used extensively.

Writing TestsEdit

For JS code in a Mediawiki extension to be testable this way, we need to be able to load it in an isolated context using Node's require statement. This means that the relevant part of the codebase needs to be re-written using ResourceLoader's new PackageFiles feature. Then the individual JS files used in this module must define a module.exports property (these files no longer need to be wrapped in self-executing functions). In addition to making code more testable, refactoring in this way lets us write JavaScript in a way that is more in line with the current practices of the wider JS community. This refactoring is currently in-progress (some modules in our extension.json use PackageFiles, while others still define an array of scripts).

Tests live in: tests/node-qunit and are organized into subfolders. Here is an example with a few simple tests for the LicenseDialogWidget, a basic UI component.

Having good coverage at the JS component level will help to catch regressions and make it easier to refactor code. Things to test for at this level include basic interactions (toggling a component in or out of edit state, for example), ensuring that appropriate API requests are sent when an action is taken, etc.

Running TestsEdit

To run Node QUnit tests, open a terminal and run npm run test:unit. They are also included in the larger npm test script (which means they will run in CI).

PHP testsEdit

PHPUnit tests are located in tests/phpunit. They must by run using MediaWiki core’s phpunit.php like this sudo -u www-data php phpunit.php --wiki wiki (in the vagrant dev environment phpunit.php is in /vagrant/mediawiki/tests/phpunit/).

Normal unit tests are in tests/phpunit/mediawiki/. Integration tests are in tests/phpunit/integration/.

End-to-end tests (Selenium)Edit

End-to-end tests represent the highest level of the "testing pyramid". Tests at this level should focus on the "happy path" for a user. They can also be used to ensure that basic functionality (like logging in and editing a page) is never hampered by a regression.

Currently it is not feasible to run extension-specific Selenium tests for WikibaseMediaInfo in the regular CI process. Instead, tests can be run against Beta Commons on a regular schedule. These tests need to live in their own location ("specs_betacommons" instead of "specs") so that they are not picked up by the Selenium script run by Core (which does happen in the CI pipeline).

There is currently an in-progress patch that adds this functionality to add Selenium tests to this extension here. This document will be updated with more information about how to write and run these tests once that patch is merged.