データベースのトランザクション

This page is a translated version of the page Database transactions and the translation is 24% complete.

MediaWikiは、データベース トランザクションを使用して、データベースの一貫性を保証するだけでなく、パフォーマンスを向上させます。

データベース トランザクションに関する一般的な情報は、次の場所にあります。:

トランザクションのスコープ

まず、2 つのタイプのメソッドを区別する必要があります。:

  • Those with outer transaction scope: methods that structurally are clearly guaranteed to have no callers up the chain that perform transaction operations (other than wrapping the method in a transaction round on behalf of said method). Such methods are said to own the transaction round. Places that have this scope are the execute() method of Maintenance scripts, the run() method of Job classes, and the doUpdate() method of DeferrableUpdate classes. When these methods run, no callers further up the call stack will have any transaction activity other than possibly defining the start/end boundary. An outer scope caller that is also structurally guaranteed to commence with no transaction round declared at all is said to have defining transaction scope. This means that methods with outer transaction scope are free to start and end transactions (given some caveats described below). Callers down the stack do not have outer scope and are expected to respect this fact.
  • Those with unclear/inner transaction scope: these are methods that are not clearly guaranteed to have outer transaction scope. Such methods do not own the transaction round, if one exists. This is most of the methods in MediaWiki core and extensions. Various methods fall under this category, such as those of model/abstraction classes, utility classes, DAO objects, hook handlers, business/control logic classes, and so on. These methods are not free to start/end transactions and must only use transaction semantics that support nesting. If they need to do some updates after commit, then they must register a post-commit callback method.

If a transaction round is started via LBFactory::beginPrimaryChanges() then it is called an explicit transaction round. Otherwise, if DBO_TRX wraps any query activity in a transaction round, as typically is the case during web requests, then it is called an implicit transaction round. Such rounds are ownerless and are committed by MediaWiki on shutdown via LBFactory::commitPrimaryChanges. Callers can start explicit rounds midway through implicit rounds, in which case any pending database writes will be committed when the explicit round commits.

トランザクションの基礎的な使用

MediaWiki is using transactions in a few ways:

  1. Using "traditional" begin()/commit() pairs to protect critical sections and be certain they are committed. Nested transactions are not supported. This should only ever be used from callers that have outer transaction scope and are only affecting one database (accounting for any possible hook handlers too). Valid methods include callbacks to onTransactionIdle() or AutoCommitUpdate where only one DB is updated and no hooks are fired. Always match each begin() with a commit().
  2. Using startAtomic()/endAtomic() pairs to protect critical sections without knowing when they will commit. Nested sections are fully supported. These can be used anywhere, but must be properly nested (e.g. do not open a section and then not close it before a "return" statement). In maintenance scripts, when no atomic sections are open, a commit happens. If the DBO_TRX flag is set, however, the atomic sections join the main DBO_TRX transaction set. Inside AutoCommitUpdate or onTransactionIdle() callbacks, DBO_TRX is turned off for the specified database, meaning the endAtomic() will commit once there are no sections in those callbacks.
  3. Using implicit pivoted transaction rounds if DBO_TRX is enabled (this is the case per default on web requests, but not for maintenance mode or unit tests). The first write on each database connection without a transaction triggers BEGIN. A COMMIT happens at the end of the request for all databases connections with writes pending. If multiple databases that have DBO_TRX set where written to, then they all will do their commit step in rapid succession, at the end of the request. This maximizes cross-DB transaction atomicity. Note that optimistic concurrency control (REPEATABLE-READ or SERIALIZABLE in PostgreSQL) might undermine this somewhat, since SERIALIZATION FAILURE can occur on a proper subset of the commits, even if all the writes appeared to succeed. In any case, DBO_TRX reduces the number of commits which can help site performance (by reducing fsync() calls) and means that all writes in the request are typically either committed or rollback together.
  4. Using explicit pivoted transactions rounds via LBFactory::beginPrimaryChanges and LBFactory::commitPrimaryChanges. These rounds are effective in both web and CLI modes and have the same semantics as their implicit counterparts except for the following aspects:
    • Calling commitPrimaryChanges() from a method that did not start the round will throw an error.
    • They commit any empty transactions on primary databases when completed, clearing any REPEATABLE-READ snapshots. This assures that callers relying on LBFactory::flushReplicaSnapshots() in single-DB setups will still have fresh snapshots for DB_REPLICA connections. It also assures that the transaction listener set in Maintenance::setTriggers sees all databases in an idle transaction state, allowing it to run deferred updates.
  5. If at any point, an exception is thrown and not caught by anything else, MWExceptionHandler will catch it and rollback all database connections with transactions. This is very useful when combined with DBO_TRX.

トランザクションの誤用のエラー

Various misuses of transactions will cause exceptions or warnings, for example:

  • Nesting begin() calls will throw an exception
  • Calling commit() on a another method's transaction, started with begin(), will throw an exception.
  • Calling begin() or commit() when an atomic section is active will throw an exception.
  • The use of LBFactory::beginPrimaryChanges and LBFactory::commitPrimaryChanges have analogous limitations to the above.
  • Calling commit() when no transaction is open will raise a warning.
  • startAtomic() and endAtomic() expect __METHOD__ as argument and its value must match on each level of atomic section nesting. If it does not match, then an exception is thrown.
  • Calling begin() or commit() when DBO_TRX is set may log a warning and no-op.
  • Calling rollback() when DBO_TRX is set will throw an error, triggering rollback of all DBs.
  • Calling getScopedLockAndFlush() while writes are still pending in a transaction will result in an exception.
  • Catching DBError, DBExpectedError, or DBQueryError exceptions without calling rollbackPrimaryChanges() can result in an exception.
  • Trying to use begin() or commit() in a SqlDataUpdate that is set to use the transaction support that class provides may cause exceptions. By doing so, outer scope is forfeited so that multiple such updates can be part of a single transaction round.

書き込みクエリの適切なコンテキスト

Aside from legacy code, database write transactions (including auto-commit mode queries) in MediaWiki should only happen during execution of:

  • HTTP POST requests to SpecialPages where, in the PHP class, doesWrites() returns true
  • HTTP POST requests to Action pages where, in the PHP class, doesWrites() returns true
  • HTTP POST requests to API modules where, in the PHP class, isWriteMode() returns true
  • Jobs in JobRunner (which uses site-internal HTTP POST requests)
  • Maintenance scripts run from the command line

For writes in the context of HTTP GET requests, use the job queue.

For writes that do not have to happen before the HTTP response is sent to the client, they can be deferred via DeferredUpdates::addUpdate() with an appropriate DeferrableUpdate subclass (usually AtomicSectionUpdate or AutoCommitUpdate) or via DeferredUpdates::addCallableUpdate() with a callback. The job queue should be used for such updates when they are slow or too resource intensive to run in ordinary request threads on non-dedicated servers.

書き込みのアトミックグループを指定する

When a set of queries are intimately related in determining a unit of database writes, one should use an atomic section. For example:

$dbw->startAtomic( __METHOD__ );
$res = $dbw->select( 'mytable', '*', ..., __METHOD__, [ 'FOR UPDATE' ] );
// determine $rows based on $res
$dbw->insert( 'mysubtable', $rows, __METHOD__ );
$dbw->update( 'mymetatable', ..., ..., __METHOD__ );
$dbw->endAtomic( __METHOD__ );

Another style of doing this is to use doAtomicSection(), which is useful if there are many return statements.

$dbw->doAtomicSection(
    __METHOD__,
    function ( IDatabase $dbw ) {
        $res = $dbw->select( 'mytable', '*', ..., __METHOD__, [ 'FOR UPDATE' ] );
        // determine $rows based on $res
        $dbw->insert( 'mysubtable', $rows, __METHOD__ );
        $dbw->update( 'mymetatable', ..., ..., __METHOD__ );
    }
);

書き込みを複数トランザクションに分割する

パフォーマンスの改善

Situations

Suppose you have some code that applies some database updates. After the method finishes you may want to:

  • a) Apply some highly contentious database updates near the end of the transaction so they don't hold locks too long
  • b) Apply further database updates that happen to be slow, non-timely, and don't need 100% atomicity (e.g. they can be refreshed)

Methods

In some cases, code may want to know that data is committed before continuing to the next steps. One way to do this is to put the next steps in callback to onTransactionIdle(), AtomicSectionUpdate, or AutoCommitUpdate. The latter two are DeferredUpdates, which differ somewhat in Maintenance vs web/job request mode:

  • In web requests and jobs (including jobs in CLI mode), deferred updates run after the main transaction round commits. Each update is wrapped in its own transaction round, though AutoCommitUpdate disables DBO_TRX on the specified database handle, committing each query on the fly. If deferred updates enqueue other deferred updates, the extra transaction rounds are simply added.
  • In Maintenance scripts, deferred updates run after any transaction on the local (e.g. "current wiki") database commits (or immediately if there is no open transaction). Deferred updates cannot simply be automatically deferred until no transactions are active as that might lead to out of memory errors for long running scripts where some (possibly "foreign wiki") database always has an active transaction (this would otherwise be ideal). This is why deferred updates are oddly tied only to the local primary database. Regardless, since Maintenance::execute() has outer transaction scope and DBO_TRX is off for them, it doesn't usually make sense to directly call DeferredUpdates::addUpdate() from the execute() method, since the code could just run immediately.

Any method with outer transaction scope has the option of calling commitPrimaryChanges( __METHOD__ ) on the LBFactory singleton to flush all active transactions on all database. This assures that all pending updates are committed before the next lines of code are executed. Also, if such a method wants to start a transaction round, it can use beginPrimaryChanges( __METHOD__ ) on the singleton, do the updates, and then call commitPrimaryChanges( __METHOD__ ) on the singleton. This will set DBO_TRX on current and new DB handles during the round, causing implicit transactions to be started up, even in CLI mode. The DBO_TRX flag is reverted to its prior state after the round ends. Note that similar to begin() and commit(), transaction rounds cannot be nested.

Note that some databases, like those handling ExternalStoreDB, usually have DBO_DEFAULT disabled. This means that they remain in auto-commit mode even during transaction rounds.

For the cases above, here are some techniques for handling them:

Case A:

// Update is still atomic by being in the main transaction round, but is near the end
$dbw->onTransactionIdleOrPreCommit( function () use ( $dbw ) {
    $dbw->upsert( 
        'dailyedits', 
        [ 'de_day' => substr( wfTimestamp( TS_MW ), 0, 6 ), 'de_count' => 1 ],
        [ 'de_day' ],
        [ 'de_count = de_count + 1' ],
        __METHOD__
    );
} );

Case B:

DeferredUpdate::addUpdate(
    new AtomicSectionUpdate( 
        __METHOD__,
        $dbw,
        function ( $dbw, $fname ) {
        ...set of atomic statements...
        } 
    )
) );
DeferredUpdate::addUpdate(
    new AutoCommitUpdate( 
        __METHOD__,
        $dbw,
        function ( $dbw, $fname ) {
        ...set of autocommit statements...
        } 
    )
) );

レプリケーションのラグへの対応

Situations

Writes queries (e.g. create, update, delete operations) that affect many rows or have poor index usage take a long time to complete. Worse, is that replica databases often uses serial replication, so they apply primary transactions one at a time. This means that a 10 second UPDATE query will block that long on each replica database (sometimes more since replica DBs have to handle read traffic and replicate the primary's writes). This creates lag, where other updates on the primary do not show for a while to other users. It also slows down users making edits due to ChronologyProtector trying to wait for replicas to catch up.

The main cases where this can happen are:

  • a) Job classes that do expensive updates
  • b) Maintenance scripts that do mass updates to large portions of tables

Another situation that sometimes comes up is when an update triggers a job, and that job must do some heavy queries in order to recompute something. It may not be a good idea to do the queries on the primary DB, but the replica DBs might be lagged and not reflect the change that triggered the job. This leads to the following case:

  • c) Jobs needing to wait for a single replica DB to catch up so they can query it to determine the updates

A similar scenario can happen with external services. Suppose a service needs to do expensive API queries in order to reflect changes in the wiki's database. So this leaves another case:

  • d) Callers that do updates before notifying an external service that it needs to query the DBs to update itself

Methods

Expensive updates that create lag need to be moved to a Job class and that the job's run() method should batch the updates, waiting for replicas to catch up between each batch.

Case A / B:

$dbw = wfGetDB( DB_PRIMARY );
$factory = MediaWikiServices::getInstance()->getDBLoadBalancerFactory();
$ticket = $factory->getEmptyTransactionTicket( __METHOD__ );
$rowBatches = array_chunk( $rows, $wgUpdateRowsPerQuery );
foreach ( $rowBatches as $rowBatch ) {
    $dbw->insert( 'mydatatable', $rows, __METHOD__ );
    ...run any other hooks or methods...
    $factory->commitAndWaitForReplication( __METHOD__, $ticket );
}

Case C:

$lb = MediaWikiServices::getInstance()->getDBLoadBalancer();
$dbr = $lb->getConnection( DB_REPLICA );
// Wait for $dbr to reach the current primary position
$lb->safeWaitForPrimaryPos( $dbr );
// Clear any stale REPEATABLE-READ snapshots
$dbr->flushSnapshot( __METHOD__ );

$factory->beginPrimaryChanges( __METHOD__ );
...query $dbr and do updates...
$factory->commitPrimaryChanges( __METHOD__ );

Case D:

$dbw = wfGetDB( DB_PRIMARY );
...do updates to items in $dbw...
// Use a POSTSEND deferred update to avoid blocking the client
DeferredUpdates::addCallableUpdate( 
    function () {
        $factory = MediaWikiServices::getInstance()->getDBLoadBalancerFactory();
        $factory->commitAndWaitForReplication();
        // Send idempotent HTTP request to regenerate the changed items
        $http = new MultiHttpClient( [] );
        $http->run( ... );
        // Service will do expensive API queries hitting DB_REPLICA
    },
    DeferredUpdates::POSTSEND,
    $dbw // abort callback on rollback
);

RDBMS 以外のセカンダリーストアの更新

Situations

Sometimes changes to the primary data-set demand updates to secondary data stores (that lack BEGIN...COMMIT), for example:

  • a) Enqueue a job that will query some of the affected rows, making the end-user wait on its insertion
  • b) Enqueue a job that will query some of the affected rows, inserting it after the MediaWiki response to the end-user is flushed
  • c) Send a request to a service that will query some of the effected rows, making the end-user wait on the service request
  • d) Send a request to a service that will query some of the effected rows, doing it after the MediaWiki response to the end-user is flushed
  • e) Purge CDN proxy cache for URLs that have content based on the effected rows
  • f) Purge the WANObjectCache entry for a changed row
  • g) Storing a non-derivable text/semi-structured blob to another store
  • h) Storing a non-derivable file to another store
  • i) Account creation hook handler creating an LDAP entry that must accompany the new user
  • j) Updating the database and sending an e-mail to a user's inbox as part of a request by the user

Methods

In general, derivable (e.g. can be regenerated) updates to external stores will use some sort DeferrableUpdate class or onTransactionIdle() to be applied post-commit. In cases where the external data is immutable, then it can be referenced by autoincrement ID, UUID, or hash of the externally stored contents; storing the data pre-commit is best in such cases. Updates that do not fall into either category should use onTransactionPreCommitOrIdle(), batch all the update to the external store into one transaction if possible, and throw an error if the update fails (which will trigger RDBMS rollback); this reduces the window that things could go wrong and result in inconsistent data.

Case A:

$job = new MyJobClass( $title, [ ... ] );
// Job insertion will abort if $dbw rolls back for any reason
$dbw->onTransactionIdle( function() use ( $jobs ) {
    JobQueueGroup::singleton()->push( $job );
} );

Case B:

$job = new MyJobClass( $title, [ ... ] );
// Job insertion will abort if $dbw rolls back for any reason
$dbw->onTransactionIdle( function() use ( $jobs ) {
    // End-user is not blocked on the job being pushed
    JobQueueGroup::singleton()->lazyPush( $job );
} );

Case C:

DeferredUpdate::addCallableUpdate( 
    function () use ( $data ) {
        $http = new MultiHttpClient( [] );
        $http->run( ... );
    },
    DeferredUpdates::PRESEND, // block the end-user
    $dbw // abort update on rollback of this DB
);

Case D:

DeferredUpdate::addCallableUpdate( 
    function () use ( $data ) {
        $http = new MultiHttpClient( [] );
        $http->run( ... );
    },
    DeferredUpdates::POSTSEND, // don't block end-user
    $dbw // abort update on rollback of this DB
);

Case E:

DeferredUpdate::addUpdate( 
    new CdnCacheUpdate( $urls ),
    DeferredUpdates::PRESEND // block end-user so they don't see stale pages on refresh
) );

Case F:

// Update a row
$dbw->update( 'mytable', ..., [ 'myt_id' => $id ], __METHOD__ ); 
// Invalidate the corresponding cache key
$cache = MediaWikiServices::getInstance()->getMainWANObjectCache();
$key = $cache->makeKey( 'my-little-key', $id );
$dbw->onTransactionIdle( function () use ( $cache, $key ) {
    $cache->delete( $key ); // purge/tombstone key right after commit
} );

Case G:

$dbw = wfGetDB( DB_PRIMARY );
$vrs = MediaWikiServices::getInstance()->getVirtualRESTServiceClient();
// Define the row data
$uuid = UIDGenerator::newUUIDv1();
$row = [ 'myr_text_uuid' => $uuid, ... ];
// First insert blob into the key/value store keyed under $uuid
$status = $vrs->run( [ 'method' => 'PUT', 'url' => "/mystore/map-json/{$uuid}", 'body' => $blob, ... );
if ( !$status->isGood() ) {
   throw new RuntimeException( "Failed to update key/value store." );
}
// Insert record pointing to blob.
// If we fail to commit, then store will just have a dangling blob.
// However, the user will not see records with broken blobs.
$dbw->insert( 'myrecords', $row, __METHOD__ );

Case H:

$dbw = wfGetDB( DB_PRIMARY );
$be = FileBackendGroup::singleton()->get( 'global-data' );
// Define the row data
$sha1 = $tempFsFile->getSha1Base36(); // SHA-1 of /tmp file uploaded from user
$row = [ 'maf_text_sha1' => $sha1, ... ];
// Make the container/directory if needed
$status = $be->prepare( [ 'dir' => "mwstore://global-data/mytextcontainer" ] );
// Copy the file into the store
$status->merge( $be->store( [ 'src' => $tempFsFile->getPath(), 'dst' => "mwstore://global-data/mytextcontainer/{$sha1}.png" ] ) );
if ( !$status->isGood() ) {
   throw new RuntimeException( "Failed to update key/value store." );
}
// Insert record pointing to file.
// If we fail to commit, then store will just have a dangling file.
// However, the user will not see records with broken files.
$dbw->insert( 'myavatarfiles', $row, __METHOD__ );

Case I:

// LDAP will not be updated if $dbw rolls back for any reason
$dbw->onTransactionPreCommitOrIdle( function() use ( $ldap ) {
    $status = $ldap->createUser( $user );
    if ( !$status->isGood() ) {
        // If the user already exists or LDAP is down, 
        // throw a GUI error and rollback all databases.
        $lbFactory = MediaWikiServices::getInstance()->getDBLoadBalancerFactory();
        $lbFactory->rollbackPrimaryChanges( __METHOD__ );
        throw new ErrorPageError( ... );
    }
    // If COMMIT fails, then we have an LDAP user with no local
    // user row. The code should be able to recover from this.
} );

Case J:

// Email will not be sent if $dbw rolls back for any reason
$dbw->onTransactionIdle( function() use ( $subject, $body, $user ) {
    $status = $user->sendEmail( $subject, $body );
    if ( !$status->isGood() ) {
        // Assuming this mail is critical, throw an error if it fail to send/enqueue.
        // Many setups will use local queuing via exim and bounces are usually not
        // synchronous, so there is no way to know for sure if the email "made it".
        throw new ErrorPageError( ... );
    }
} );

トランザクションのロールバックの使用

The use of rollback() should strongly be avoided, since it affects what all the previous executed code did before the rollback. Outer callers might still consider that an operation was successful and attempt additional updates and/or show bogus success page. Any REPEATABLE-READ snapshot is renewed, causing objects with lazy-loading to possibly not find what they were looking for, or to get unexpectedly newer data than the rest of what was loaded. Using rollback() is particularly bad since other databases might have related changes and it's easy to forget to roll those back too.

Instead, simply throwing an exception is enough to trigger rollback of all databases due to MWExceptionHandler::handleException(). There are two special cases to this rule:

  • Exceptions of the type ErrorPageError (used for human-friendly GUI errors) do not trigger rollback on web requests inside of MediaWiki::run() if thrown before MediaWiki::doPostOutputShutdown(). This lets actions,special pages, and PRESEND deferred updates show proper error messages, while auditing and anti-abuse tools can still log updates to the database.
  • Callers might catch DBError exceptions without either re-throwing them or throwing their own version of the error. Doing so is extremely bad practice and can cause all sorts of problems from partial commits to simply spewing up DBTransactionError errors. Only catch DB errors in order to do some cleanup before re-throwing an error or if the database in question is used exclusively by the code catching errors.

This is how rollback is normally used, as a fail-safe that aborts everything, returns to the initial state, and errors out. However, if directly calling rollback is truly needed, always use rollbackPrimaryChanges() on the LBFactory singleton to make sure all databases are reverted to the initial state of any transaction round.

Debug logging

Several channels (log groups) are used to log DB related errors and warnings:

  • DBQuery
  • DBConnection
  • DBPerformance
  • DBReplication
  • exception

At Wikimedia, these logs can be found by querying logstash.wikimedia.org using +channel:<CHANNEL NAME>.

過去の議論

これは、wikitech-l メーリングリストでのいくつかの会話と、その後の Bugzilla での議論の結果です。 関連する議論をいくつか紹介します:

あるメールの中で、Tim Starling は DBO_TRX システムの理由を説明していました。以下は、彼の説明の再編集版です:

DBO_TRX には以下のメリットがあります:

  • トランザクションを意識していないコード (例: エラー時ロールバック (rollback-on-error)) に対して、書き込み操作の一貫性を向上できます。
  • 一貫した読み取りのためのスナップショットを提供し、同時書き込みが発生した場合のアプリケーションの正確性を向上させます。

DBO_TRX は InnoDB に切り替えたときに、Database::begin()Database::commit() の導入と一緒に導入されました。

[...]

当初は、begin() がトランザクション レベルをインクリメントし、commit() がそれをデクリメントするという意味で、トランザクションが「入れ子」になるスキームを設定したのです。 デクリメントされて 0 になると、実際に COMMIT が発行されました。 そのため以下のような呼び出しの流れになります:

  • begin() -- BEGIN を送信
  • begin() -- 何もしない
  • commit() -- 何もしない
  • commit() -- COMMIT を送信

この方式はすぐに不適切であることが判明しました。性能と正しさのために最も重要なのは、ある特定のクエリが完了した後に現在のトランザクションをコミットできるアプリケーションであることが判明したためです。 このユースケースに対応するために導入されたのが Database::immediateCommit() であり、その機能は、トランザクション レベルを直ちにゼロにし、基礎となるトランザクションをコミットすることでした。

Database::commit() の呼び出しが本当は Database::immediateCommit() であることが明らかになったとき、私は意味づけを変え、事実上 Database::immediateCommit()Database::commit() に改名しました。 私は、トランザクションの入れ子という考え方を排除し、トランザクションの長さを協調的に管理するモデルを採用しました:

  • Database::begin() は、ウェブ リクエストでは事実上無意味なものとなり、簡潔にするために省略されることもありました。
  • Database::commit() は、アトミック性が望まれる一連の書き込み操作の完了後、または競合するロックが保持されている最も早い機会に呼び出されるべきです。

[...]

トランザクションが長すぎると、ロックの競合によるパフォーマンスの問題にぶつかります。 トランザクションが短すぎると、リクエストに失敗したときに一貫性の問題にぶつかります。 私が導入した方式は、一貫性よりも性能を優先しています。 最短のトランザクション時間を用いて、呼び出し元と呼び出し先の間の競合を解決するものです。 当時も今も、ウィキペディアには適切な選択だったと思いますし、他の多くの中/高トラフィックのウィキにも、おそらく適切だと思います。

セーブポイントは、この方式が導入された時点では利用できませんでした。 しかし、これは放棄されたトランザクションの入れ子方式を改良したものであり、ロック競合を減らすために最適化された現在の方式を改良したものではありません。

パフォーマンスの面では、おそらく、入れ子のためのセーブポイントを持つ明示的な begin() の短いトランザクションを使用することが実現可能でしょう。 しかしそうなると、冒頭で述べた DBO_TRX の一貫性のメリットが失われてしまいます。

-- Tim Starling