Manual:Datenbank-Zugriff

This page is a translated version of the page Manual:Database access and the translation is 28% complete.

Outdated translations are marked like this.

Dieser Artikel gibt einen Überblick über den Datenbankzugriff und allgemeine Datenbankthemen im MediaWiki.

Beim Kodieren in MediaWiki wird für gewöhnlich nur über die dafür vorgesehenen Funktionen von MediaWiki auf die Datenbank zugegriffen.

Datenbankschema

Informationen über das MediaWiki-Datenbank-Layout, wie z.B. eine Beschreibung der Tabellen und ihrer Inhalte, finden sich unter Manual:Database layout und $tables. In der Vergangenheit wurde dies in MediaWiki auch in maintenance/tables.sql dokumentiert, aber ab MediaWiki 1.35 wird dies als Teil der Abstract Schema Initiative schrittweise nach sql/tables.json verschoben. Das bedeutet, dass sql/tables.json mit einem maintenance script in sql/mysql/tables-generated.sql umgewandelt wird, was die Erstellung von Schema-Dateien zur Unterstützung verschiedener Datenbank-Engines erleichtert.

Einloggen in MySQL

sql.php verwenden

MediaWiki bietet ein Wartungsskript für den Zugriff auf die Datenbank. Aus dem maintenance-Verzeichnis ausführen:

php run.php sql

Damit können Datenbankabfragen formuliert werden. Alternativ kann auch ein Dateiname angegeben werden, der dann von MediaWiki ausgeführt wird, wobei alle MW-Spezialvariablen entsprechend ersetzt werden. Für mehr Informationen siehe Handbuch:sql.php .

Dies gilt für alle Datenbank-Backends. Die Eingabeaufforderung ist jedoch nicht so umfangreich wie die Kommandozeilen-Clients, die mit der Datenbank installiert werden.

mysql-Kommandozeilenbefehle verwenden

## Datenbank-Einstellungen
$wgDBtype           = "mysql";
$wgDBserver         = "localhost";
$wgDBname           = "your-database-name";
$wgDBuser           = "your-database-username";  // Default: root
$wgDBpassword       = "your-password";

Das MySQL-Passwort und den Benutzernamen finden sich in LocalSettings.php des Wikis, beispielsweise:

Via SSH meldet man sich mit folgender Eingabe an:

mysql -u $wgDBuser -p --database=$wgDBname

Dabei sind $wgDBuser und $wgDBname durch ihre Werte in LocalSettings.php zu ersetzen. Danach wird die Eingabe des Passwortes für $wgDBpassword erwartet, woraufhin die Eingabeaufforderung mysql> erscheint.

Datenbank-Abstraktionsschicht

MediaWiki uses the Rdbms library as its database abstraction layer. Developers must not directly call low-level database functions, such as mysql_query.

Each connection is represented by Wikimedia\Rdbms\IDatabase from which queries can be performed. Connections can be acquired by calling getPrimaryDatabase() or getReplicaDatabase() (depending on the usecase) on an IConnectionProvider instance, preferably dependency-injected, or obtained from MediaWikiServices via DBLoadBalancerFactory service. The function wfGetDB() is being phased out and should not be used in new code.

For getting database connections you can call either getReplicaDatabase() for read queries or getPrimaryDatabase() for write queries and write-informing read queries. The distinction between primary and replica is important in a multi-database environment, such as Wikimedia. See the Wrapper functions section below for how to interact with IDatabase objects.

Beispiel für lesenden Datenbank-Zugriff:

MediaWiki Version:

≥ 1.42

use MediaWiki\MediaWikiServices;

$dbProvider = MediaWikiServices::getInstance()->getConnectionProvider();
$dbr = $dbProvider->getReplicaDatabase();

$res = $dbr->newSelectQueryBuilder()
  ->select( /* ... */ ) // see docs
  ->fetchResultSet();

foreach ( $res as $row ) {
	print $row->foo;
}

Beispiel für schreibenden Datenbank-Zugriff:

MediaWiki Version:

≥ 1.41

$dbw = $dbProvider->getPrimaryDatabase();
$dbw->newInsertQueryBuilder()
    ->insertInto( /* ... */ ) // see docs
    ->caller( __METHOD__ )->execute();

We use the convention $dbr for readable connections (replica) and $dbw for writable connections (primary). Also $dbProvider is used for the IConnectionProvider instance

SelectQueryBuilder

MediaWiki Version:

≥ 1.35

The SelectQueryBuilder class is the preferred way to formulate read queries in new code. In older code, you might find select() and related methods of the Database class used directly. The query builder provides a modern "fluent" interface, where methods are chained until the fetch method is invoked, without intermediary variable assignments needed. For example:

$dbr = $dbProvider->getReplicaDatabase();
$res = $dbr->newSelectQueryBuilder()
	->select( [ 'cat_title', 'cat_pages' ] )
	->from( 'category' )
	->where( 'cat_pages > 0' )
	->orderBy( 'cat_title', SelectQueryBuilder::SORT_ASC )
	->caller( __METHOD__ )->fetchResultSet();

As described below, MW 1.42 introduces a helper method, expr(), which lets you wrap the field, operator and value as an expression. Using this, the where clause in the above example can be rewitten as

->where( $dbr->expr( 'cat_pages', '>', 0 ) )

This example corresponds to the following SQL:

SELECT cat_title, cat_pages FROM category WHERE cat_pages > 0 ORDER BY cat_title ASC

JOINs sind auch möglich; zum Beispiel:

$dbr = $dbProvider->getReplicaDatabase();
$res = $dbr->newSelectQueryBuilder()
	->select( 'wl_user' )
	->from( 'watchlist' )
	->join( 'user_properties', /* alias: */ null, 'wl_user=up_user' )
	->where( [
		'wl_user != 1',
		'wl_namespace' => '0',
		'wl_title' => 'Main_page',
		'up_property' => 'enotifwatchlistpages',
	] )
	->caller( __METHOD__ )->fetchResultSet();

This example corresponds to the query:

SELECT wl_user
FROM `watchlist`
INNER JOIN `user_properties` ON ((wl_user=up_user))
WHERE (wl_user != 1)
AND wl_namespace = '0'
AND wl_title = 'Main_page'
AND up_property = 'enotifwatchlistpages'

You can access individual rows of the result using a foreach loop. Each row is represented as an object. For example:

$dbr = $dbProvider->getReplicaDatabase();
$res = $dbr->newSelectQueryBuilder()
	->select( [ 'cat_title', 'cat_pages' ] )
	->from( 'category' )
	->where( 'cat_pages > 0' )
	->orderBy( 'cat_title', SelectQueryBuilder::SORT_ASC )
	->caller( __METHOD__ )->fetchResultSet();      

foreach ( $res as $row ) {
	print 'Category ' . $row->cat_title . ' contains ' . $row->cat_pages . " entries.\n";
}

There are also convenience functions to fetch a single row, a single field from several rows, or a single field from a single row:

// Equivalent of:
//     $rows = fetchResultSet();
//     $row = $rows[0];
$pageRow = $dbr->newSelectQueryBuilder()
	->select( [ 'page_id', 'page_namespace', 'page_title' ] )
	->from( 'page' )
	->orderBy( 'page_touched', SelectQueryBuilder::SORT_DESC )
	->caller( __METHOD__ )->fetchRow();

// Equivalent of:
//     $rows = fetchResultSet();
//     $ids = array_map( fn( $row ) => $row->page_id, $rows );
$pageIds = $dbr->newSelectQueryBuilder()
	->select( 'page_id' )
	->from( 'page' )
	->where( [
		'page_namespace' => 1,
	] )
	->caller( __METHOD__ )->fetchFieldValues();

// Equivalent of:
//     $rows = fetchResultSet();
//     $id = $row[0]->page_id;
$pageId = $dbr->newSelectQueryBuilder()
	->select( 'page_id' )
	->from( 'page' )
	->where( [
		'page_namespace' => 1,
		'page_title' => 'Main_page',
	] )
	->caller( __METHOD__ )->fetchField();

In these examples, $pageRow is an row object as in the foreach example above, $pageIds is an array of page IDs, and $pageId is a single page ID.

While you can use tables() to add multiple tables, it is highly recommended to use join() or leftJoin() instead. Any aliases for additional tables must be added to join() or leftJoin(), not in tables().

UpdateQueryBuilder

MediaWiki Version:

≥ 1.41

SQL UPDATE statements should be done with the UpdateQueryBuilder .

$dbw = $this->dbProvider->getPrimaryDatabase();
$dbw->newUpdateQueryBuilder()
	->update( 'user' )
	->set( [ 'user_password' => $newHash->toString() ] )
	->where( [
		'user_id' => $oldRow->user_id,
		'user_password' => $oldRow->user_password,
	] )
	->caller( $fname )->execute();

InsertQueryBuilder

MediaWiki Version:

≥ 1.41

SQL INSERT statements should be done with the InsertQueryBuilder.

$dbw = $this->dbProvider->getPrimaryDatabase();
$targetRow = [
	'bt_address' => $targetAddress,
	'bt_user' => $targetUserId,
	/* etc */
];
$dbw->newInsertQueryBuilder()
	->insertInto( 'block_target' )
	->row( $targetRow )
	->caller( __METHOD__ )->execute();
$id = $dbw->insertId();

DeleteQueryBuilder

MediaWiki Version:

≥ 1.41

SQL DELETE statements should be done with the DeleteQueryBuilder.

$dbw = $this->dbProvider->getPrimaryDatabase();
$dbw->newDeleteQueryBuilder()
	->deleteFrom( 'block' )
	->where( [ 'bl_id' => $ids ] )
	->caller( __METHOD__ )->execute();
$numDeleted = $dbw->affectedRows();

ReplaceQueryBuilder

MediaWiki Version:

≥ 1.41

SQL REPLACE statements should be done with the ReplaceQueryBuilder.

$dbw = $this->dbProvider->getPrimaryDatabase();
$dbw->newReplaceQueryBuilder()
	->replaceInto( 'querycache_info' )
	->row( [
		'qci_type' => 'activeusers',
		'qci_timestamp' => $dbw->timestamp( $asOfTimestamp ),
	] )
	->uniqueIndexFields( [ 'qci_type' ] )
	->caller( __METHOD__ )->execute();

UnionQueryBuilder

MediaWiki Version:

≥ 1.41

SQL UNION statements should be done with the UnionQueryBuilder.

$dbr = $this->dbProvider->getReplicaDatabase();
$ids = $dbr->newUnionQueryBuilder()
	->add( $db->newSelectQueryBuilder()
		->select( 'bt_id' )
		->from( 'block_target' )
		->where( [ 'bt_address' => $addresses ] )
	)
	->add( $db->newSelectQueryBuilder()
		->select( 'bt_id' )
		->from( 'block_target' )
		->join( 'user', null, 'user_id=bt_user' )
		->where( [ 'user_name' => $userNames ] )
	)
	->caller( __METHOD__ )
	->fetchFieldValues();

Batch queries

If you need to insert or update multiple rows, try to group them together into a batch query for increased efficiency. It's important to keep the table declaration (e.g. update(), insertInto(), etc.), caller(), and execute() outside the loop. Anything related to creating or updating rows can go inside the loop (e.g. row()).

$queryBuilder = $this->getDb()->newInsertQueryBuilder()
	->insertInto( 'ores_classification' )
	->caller( __METHOD__ );
foreach ( [ 0, 1, 2, 3 ] as $id ) {
	$predicted = $classId === $id;
	$queryBuilder->row( [
		'oresc_model' => $this->ensureOresModel( 'draftquality' ),
		'oresc_class' => $id,
    	'oresc_probability' => $predicted ? 0.7 : 0.1,
		'oresc_is_predicted' => $predicted ? 1 : 0,
		'oresc_rev' => $revId,
	] );
}
$queryBuilder->execute();

Helpers

The following helper methods should be used when appropriate, because they build SQL queries that are compatible with all supported database types, and they assist with auto escaping.

`$dbr->expr()`

MediaWiki Version:

≥ 1.42

Should be used in WHERE statements whenever anything is being compared that isn't a simple equals statement. For example, $dbr->expr( 'ptrp_page_id', '>', $start ).

This method can be chained with ->and() and ->or(). For example, $db->expr( 'ptrp_page_id', '=', null )->or( 'ptrpt_page_id', '=', null )

`$dbr->timestamp()`

Different database engines format MediaWiki timestamps differently. Use this to ensure compatibility. Example: $dbr->expr( 'ptrp_reviewed_updated', '>', $dbr->timestamp( $time ) )

RawSQLExpression

MediaWiki Version:

≥ 1.42

Should be used in WHERE statements when you do not want to SQL escape anything. If comparing a field to a user value (much more common), use $dbr->expr() instead. RawSQLExpression does not escape, so it should never be used with user input. Use sparingly! Example: $dbr->expr( new RawSQLExpression( 'rc_timestamp < fp_pending_since' ) )

RawSQLValue

MediaWiki Version:

≥ 1.43

Should be used in WHERE statements when you do not want to SQL escape anything. If comparing a field to a user value (much more common), use $dbr->expr() instead. RawSQLValue does not escape, so it should never be used with user input. Use sparingly! Example: $dbr->expr( 'fp_pending_since', '>', new RawSQLValue( $fieldName ) )

Ummantelungsfunktionen und Rohabfragen

Older MediaWiki code may use wrapper functions like $dbr->select() and $dbw->insert(). Very old MediaWiki code may use $dbw->query(). None of these are considered good practice now, and should be upgraded to the query builders mentioned above.

Wrapper functions are superior to $dbw->query(), because they can take care of things like table prefixes and escaping for you under some circumstances. If you really need to make your own SQL, please read the documentation for tableName() and addQuotes(). You will need both of them. Please keep in mind that failing to use addQuotes() properly can introduce severe security holes into your wiki.

Another important reason to use the high level methods rather than constructing your own queries is to ensure that your code will run properly regardless of the database type. Currently the best support is for MySQL/MariaDB. There is also good support for SQLite, however it is much slower than MySQL or MariaDB. There is support for PostgreSQL, but it is not as stable as MySQL.

In the following, the available wrapper functions are listed. For a detailed description of the parameters of the wrapper functions, please refer to class Database's docs. Particularly see Database::select for an explanation of the $table, $vars, $conds, $fname, $options, $join_conds parameters that are used by many of the other wrapper functions.

The parameters $table, $vars, $conds, $fname, $options, and $join_conds should NOT be null or false (that was working until REL 1.35) but empty string '' or empty array [].

function select( $table, $vars, $conds, .. );
function selectField( $table, $var, $cond, .. );
function selectRow( $table, $vars, $conds, .. );
function insert( $table, $a, .. );
function insertSelect( $destTable, $srcTable, $varMap, $conds, .. );
function update( $table, $values, $conds, .. );
function delete( $table, $conds, .. );
function deleteJoin( $delTable, $joinTable, $delVar, $joinVar, $conds, .. );

Komfortfunktionen

MediaWiki Version:

≤ 1.30

For compatibility with PostgreSQL, insert ids are obtained using nextSequenceValue() and insertId(). The parameter for nextSequenceValue() can be obtained from the CREATE SEQUENCE statement in maintenance/postgres/tables.sql and always follows the format of x_y_seq, with x being the table name (e.g. page) and y being the primary key (e.g. page_id), e.g. page_page_id_seq. Zum Beispiel:

$id = $dbw->nextSequenceValue( 'page_page_id_seq' );
$dbw->insert( 'page', [ 'page_id' => $id ] );
$id = $dbw->insertId();

For some other useful functions, e.g. affectedRows(), numRows(), etc., see Manual:Database.php#Functions.

Grundlegende Abfrageoptimierung

MediaWiki developers who need to write DB queries should have some understanding of databases and the performance issues associated with them. Patches containing unacceptably slow features will not be accepted. Unindexed queries are generally not welcome in MediaWiki, except in special pages derived from QueryPage . It's a common pitfall for new developers to submit code containing SQL queries which examine huge numbers of rows. Remember that COUNT(*) is O(N), counting rows in a table is like counting beans in a bucket.

Abwärtskompatibilität

Oftmals sind aufgrund von Design-Änderungen an der DB verschiedene DB-Zugriffe notwendig, um Rückwärtskompatibilität zu gewährleisten. Dies lässt sich beispielsweise mit der globalen Konstanten MW_VERSION (oder globalen Variablen $wgVersion vor MediaWiki 1.39) so handhaben:

/**
* backward compatibility
* @since 1.31.15
* @since 1.35.3
* define( 'DB_PRIMARY', ILoadBalancer::DB_PRIMARY )
* DB_PRIMARY remains undefined in MediaWiki before v1.31.15/v1.35.3
* @since 1.28.0
* define( 'DB_REPLICA', ILoadBalancer::DB_REPLICA )
* DB_REPLICA remains undefined in MediaWiki before v1.28
*/
defined('DB_PRIMARY') or define('DB_PRIMARY', DB_MASTER);
defined('DB_REPLICA') or define('DB_REPLICA', DB_SLAVE);

$res = WrapperClass::getQueryFoo();

class WrapperClass {

	public static function getReadingConnect() {
		return wfGetDB( DB_REPLICA );
	}

	public static function getWritingConnect() {
		return wfGetDB( DB_PRIMARY );
	}

	public static function getQueryFoo() {
		global $wgVersion;

		$param = '';
		if ( version_compare( $wgVersion, '1.33', '<' ) ) {
			$param = self::getQueryInfoFooBefore_v1_33();
		} else {
			$param = self::getQueryInfoFoo();
		}

		return = $dbw->select(
			$param['tables'],
			$param['fields'],
			$param['conds'],
			__METHOD__,
			$param['options'],
			$param['join_conds'] );
	}

	private static function getQueryInfoFoo() {
		return [
			'tables' => [
				't1' => 'table1',
				't2' => 'table2',
				't3' => 'table3'
			],
			'fields' => [
				'field_name1' => 't1.field1',
				'field_name2' => 't2.field2',
				…
			],
			'conds' => [ …
			],
			'join_conds' => [
				't2' => [
					'INNER JOIN',
					'field_name1 = field_name2'
				],
				't3' => [
					'LEFT JOIN',
					…
				]
			],
			'options' => [ …
			]
		];
	}

	private static function getQueryInfoFooBefore_v1_33() {
		return [
			'tables' => [
				't1' => 'table1',
				't2' => 'table2',
				't3' => 'table3_before'
			],
			'fields' => [
				'field_name1' => 't1.field1',
				'field_name2' => 't2.field2_before',
				…
			],
			'conds' => [ …
			],
			'join_conds' => [
				't2' => [
					'INNER JOIN',
					…
				],
				't3' => [
					'LEFT JOIN',
					…
				]
			],
			'options' => [ …
			]
		];
	}
}

MediaWiki Version:

≥ 1.35

	public static function getQueryFoo() {

		$param = '';
		if ( version_compare( MW_VERSION, '1.39', '<' ) ) {
			$param = self::getQueryInfoFooBefore_v1_39();
		} else {
			$param = self::getQueryInfoFoo();
		}

		return = $dbw->select(
			$param['tables'],
			$param['fields'],
			$param['conds'],
			__METHOD__,
			$param['options'],
			$param['join_conds'] );
	}

Vervielfältigung

Large installations of MediaWiki such as Wikipedia use a large set of replica MySQL servers replicating writes made to a primary MySQL server. It is important to understand the complexities associated with large distributed systems if you want to write code destined for Wikipedia.

It's often the case that the best algorithm to use for a given task depends on whether or not replication is in use. Due to our unabashed Wikipedia-centrism, we often just use the replication-friendly version, but if you like, you can use wfGetLB()->getServerCount() > 1 to check to see if replication is in use.

Verzögerung

Die Verzögerung tritt vor allem dann auf, wenn große Schreibanfragen an den Master gesendet werden. Writes on the primary server are executed in parallel, but they are executed in serial when they are replicated to the replicas. The primary server writes the query to the binlog when the transaction is committed. The replicas poll the binlog and start executing the query as soon as it appears. They can service reads while they are performing a write query, but will not read anything more from the binlog and thus will perform no more writes. This means that if the write query runs for a long time, the replicas will lag behind the primary server for the time it takes for the write query to complete.

Lag can be exacerbated by high read load. MediaWiki's load balancer will stop sending reads to a replica when it is lagged by more than 5 seconds. If the load ratios are set incorrectly, or if there is too much load generally, this may lead to a replica permanently hovering around 5 seconds lag.

In Wikimedia production, databases have semi-sync enabled, meaning a change won't be committed in primary unless it's committed in at least half of the replicas. This means a lot of load could lead to all edits and other write operations to be refused, with an error returned to the user. This gives the replicas a chance to catch up.

Before we had these mechanisms, the replicas would regularly lag by several minutes, making review of recent edits difficult.

In addition to this, MediaWiki attempts to ensure that the user sees events occurring on the wiki in chronological order. A few seconds of lag can be tolerated, as long as the user sees a consistent picture from subsequent requests. This is done by saving the primary binlog position in the session, and then at the start of each request, waiting for the replica to catch up to that position before doing any reads from it. If this wait times out, reads are allowed anyway, but the request is considered to be in "lagged replica mode". Lagged replica mode can be checked by calling LoadBalancer::getLaggedReplicaMode(). The only practical consequence at present is a warning displayed in the page footer.

Shell users can check replication lag with getLagTimes.php ; other users can check using the siteinfo API.

Databases often have their own monitoring systems in place as well, see for instance wikitech:MariaDB#Replication lag (Wikimedia) and wikitech:Help:Toolforge/Database#Identifying lag (Wikimedia Cloud VPS).

Verzögerungsvermeidung

To avoid excessive lag, queries that write large numbers of rows should be split up, generally to write one row at a time. Multi-row INSERT ... SELECT queries are the worst offenders and should be avoided altogether. Instead do the select first and then the insert.

Even small writes can cause lag if they are done at a very high speed and replication is unable to keep up. This most commonly happens in maintenance scripts. To prevent it, you should call Maintenance::waitForReplication() after every few hundred writes. Most scripts make the exact number configurable:

class MyMaintenanceScript extends Maintenance {
    public function __construct() {
        // ...
        $this->setBatchSize( 100 );
    }

    public function execute() {
        $limit = $this->getBatchSize();
        while ( true ) {
             // ...select up to $limit rows to write, break the loop if there are no more rows...
             // ...do the writes...
             $this->waitForReplication();
        }
    }
}

Mit Verzögerung arbeiten

Despite our best efforts, it's not practical to guarantee a low-lag environment. Replication lag will usually be less than one second, but may occasionally be up to 5 seconds. For scalability, it's very important to keep load on the primary server low, so simply sending all your queries to the primary server is not the answer. So when you have a genuine need for up-to-date data, the following approach is advised:

Do a quick query to the primary server for a sequence number or timestamp
Run the full query on the replica and check if it matches the data you got from the primary server
If it doesn't, run the full query on the primary server

To avoid swamping the primary server every time the replicas lag, use of this approach should be kept to a minimum. In most cases you should just read from the replica and let the user deal with the delay.

Lock contention

Due to the high write rate on Wikipedia (and some other wikis), MediaWiki developers need to be very careful to structure their writes to avoid long-lasting locks. By default, MediaWiki opens a transaction at the first query, and commits it before the output is sent. Locks will be held from the time when the query is done until the commit. So you can reduce lock time by doing as much processing as possible before you do your write queries. Update operations which do not require database access can be delayed until after the commit by adding an object to $wgPostCommitUpdateList or to Database::onTransactionPreCommitOrIdle.

Often this approach is not good enough, and it becomes necessary to enclose small groups of queries in their own transaction. Folgende Syntax ist zu verwenden:

$factory = \MediaWiki\MediaWikiServices::getInstance()->getDBLoadBalancerFactory();
$factory->beginMasterChanges(__METHOD__);
/* Do queries */
$factory->commitMasterChanges(__METHOD__);

Use of locking reads (e.g. the FOR UPDATE clause) is not advised. They are poorly implemented in InnoDB and will cause regular deadlock errors. It's also surprisingly easy to cripple the wiki with lock contention.

Instead of locking reads, combine your existence checks into your write queries, by using an appropriate condition in the WHERE clause of an UPDATE, or by using unique indexes in combination with INSERT IGNORE. Then use the affected row count to see if the query succeeded.

Datenbankschema

Don't forget about indexes when designing databases, things may work smoothly on your test wiki with a dozen of pages, but will bring a real wiki to a halt. Siehe oben für Einzelheiten.

Für Namenskonventionen siehe Manual:Kodierungskonventionen/Datenbank .

Siehe auch

Manual:Hooks/LoadExtensionSchemaUpdates — Erfordert eine Erweiterung Änderungen an der Datenbank, wenn MediaWiki aktualisiert wird, kann dies mit diesem Hook erledigt werden. Benutzer können dann ihr Wiki aktualisieren, indem sie update.php ausführen.
Database transactions