Manual:メンテナンススクリプトの作成
これは、MediaWiki 1.16 で導入されて書きやすくなったコマンドラインの MediaWiki 管理スクリプトを Maintenance
クラス (Maintenance.php 参照)に基づいて書く方法を順番に解説したものです。
Boilerplate
ここでは、単に「Hello, World」と出力する helloWorld.php
というメンテナンス スクリプトを説明します。このプログラムには、実行に必要な最小限のコードが含まれています (著作権ヘッダーも参照してください):
The below example program will print "Hello, World!".
MediaWiki コア
- Command
$ ./maintenance/run HelloWorld Hello, World!
- Filename
maintenance/HelloWorld.php
- Code
<?php require_once __DIR__ . '/Maintenance.php'; /** * Brief oneline description of Hello world. * * @since 1.17 * @ingroup Maintenance */ class HelloWorld extends Maintenance { public function execute() { $this->output( "Hello, World!\n" ); } } $maintClass = HelloWorld::class; require_once RUN_MAINTENANCE_IF_MAIN;
MediaWiki 拡張機能
- Command
$ ./maintenance/run MyExtension:HelloWorld Hello, World!
- Filename
extensions/MyExtension/maintenance/HelloWorld.php
- Code
<?php namespace MediaWiki\Extension\MyExtension\Maintenance; use Maintenance; $IP = getenv( 'MW_INSTALL_PATH' ); if ( $IP === false ) { $IP = __DIR__ . '/../../..'; } require_once "$IP/maintenance/Maintenance.php"; /** * Brief oneline description of Hello world. */ class HelloWorld extends Maintenance { public function __construct() { parent::__construct(); $this->requireExtension( 'Extension' ); } public function execute() { $this->output( "Hello, World!\n" ); } } $maintClass = HelloWorld::class; require_once RUN_MAINTENANCE_IF_MAIN;
Boilerplate explained
require_once __DIR__ . "/Maintenance.php";
We include Maintenance.php
. This defines class Maintenance
which provides the basis for all maintenance scripts, including facilities to parse command-line arguments, read console input, connect to a database, etc.
class HelloWorld extends Maintenance {
}
We declare our Maintenance subclass.
$maintClass = HelloWorld::class;
require_once RUN_MAINTENANCE_IF_MAIN;
コマンドラインから実行した場合のみ、Maintenance クラスに HelloWorld
クラスを使用してスクリプトを実行するように指示します。
内部的には、RUN_MAINTENANCE_IF_MAIN
が別ファイルの doMaintenance.php を読み込み MediaWiki クラスと設定を自動的に読み込むと次に execute()
メソッドを実行します。
public function execute() {
}
The execute()
method is the entrypoint for maintenance scripts, and is where the main logic of your script will be. Avoid running any code from the constructor.
When our program is run from the command-line, the core maintenance framework will take care of initialising MediaWiki core and configuration etc, and then it will invoke this method.
Help command
One of the built-in features that all maintenance scripts enjoy is a --help
option. The above example boilerplate would produce the following help page:
$ php helloWorld.php --help Usage: php helloWorld.php […] Generic maintenance parameters: --help (-h): Display this help message --quiet (-q): Whether to suppress non-error output --conf: Location of LocalSettings.php, if not default --wiki: For specifying the wiki ID --server: The protocol and server name to use in URL --profiler: Profiler output format (usually "text") …
説明の追加
「しかし、このメンテナンス スクリプトは何のためにあるのだろう?」 という声が聞こえてきそうです。
コンストラクターの addDescription
メソッドを使用することで、「--help
」の出力の先頭に説明を配置できます:
public function __construct() {
parent::__construct();
$this->addDescription( 'Say hello.' );
}
これで説明が出力されるようになりました:
$ php helloWorld.php --help Say hello. Usage: php helloWorld.php [--help] …
オプションと引数の解析
世界に挨拶するのもいいですが、個人にも挨拶できるようにしたいですね。
コマンドライン オプションを追加するには、class HelloWorld
に Maintenance
の addOption()
を呼び出すコンストラクターを追加し、execute()
のメソッドを新しいオプションを使用するように更新します。
addOption()
のパラメーターは $name, $description, $required = false, $withArg = false, $shortName = false
であるため、以下のようにします:
public function __construct() {
parent::__construct();
$this->addDescription( 'Say hello.' );
$this->addOption( 'name', 'Who to say Hello to', false, true );
}
public function execute() {
$name = $this->getOption( 'name', 'World' );
$this->output( "Hello, $name!" );
}
これにより、実行すると、与えられた引数によって helloWorld.php
スクリプトの出力が変化するようになりました:
$ php helloWorld.php Hello, World! $ php helloWorld.php --name=Mark Hello, Mark! $ php helloWorld.php --help Say hello. Usage: php helloWorld.php […] … Script specific parameters: --name: Who to say Hello to
拡張機能
MediaWiki バージョン: | ≧ 1.28 Gerrit change 301709 |
メンテナンス スクリプトが拡張機能向けである場合、拡張機能がインストールされていることを要件に追加する必要があります:
public function __construct() {
parent::__construct();
$this->addOption( 'name', 'Who to say Hello to' );
$this->requireExtension( 'FooBar' );
}
これは、拡張機能が有効になっていない場合に、役立つエラー メッセージを提供します。 For example, during local development a particular extension might not yet be enabled in LocalSettings.php, or when operating a wiki farm an extension might be enabled on a subset of wikis.
Be aware that no code may be executed other than through the execute()
method.
Attempts to call MediaWiki core services, classes, or functions, or calling your own extension code prior to this, will cause errors or is unreliable and unsupported (e.g. ouside the class declaration, or in the constructor).
Profiling
Maintenance scripts support a --profiler
option, which can be used to track code execution during a page action and report back the percentage of total code execution that was spent in any specific function.
See Manual:プロファイリング .
テストを書く
他のクラスと同じように、メンテナンス スクリプトのテストを書くことをお勧めします。 ヘルプと例は、メンテナンス スクリプトのガイドを参照してください。
Long-Running Scripts
If your script is designed to operate on a large number of things (e.g. all or potentially many pages or revisions), it is recommended to apply the following best practices. Keep in mind that "all revisions" can mean billions of entries and months of runtime on large sites like English Wikipedia.
Batching
When processing a large number of items, it is best to do so in batches of relatively small size - typically between 100 or 1000, depending on the time needed to process each entry.
Batching must be based on a database field (or combination of fields) covered by a unique database index, typically a primary key.
Using page_id
or rev_id
are typical examples.
Batching is achieved by structuring your script into an inner loop and an outer loop: The inner loop processes a batch of IDs, and the outer loop queries the database to get the next batch of IDs. The outer loop needs to keep track of where the last batch ended, and the next batch should start.
For a script that operates on pages, it would look something like this:
$batchStart = 0;
// We assume that processPages() will write to the database, so we use the primary DB.
$dbw = $this->getPrimaryDB();
while ( true ) {
$pageIds = $dbw->newSelectQueryBuilder()
->select( [ 'page_id' ] )
->from( 'page' )
->where( ... ) // the relevant condition for your use use
->where( $dbw->expr( 'page_id', '>=', $batchStart ) ) // batch condition
->oderBy( 'page_id' ) // go over pages in ascending order of page IDs
->limit( $this->getBatchSize() ) // don't forget setBatchSize() in the constructor
->caller( __METHOD__ )
->fetchFieldValues();
if ( !$pageIds ) {
// no more pages found, we are done
break;
}
// Do something for each page
foreach ( $pageIds as $id ) {
$this->updatePage( $dbw, $id );
}
// Now commit any changes to the database.
// This will automatically call waitForReplication(), to avoid replication lag.
$this->commitTransaction( $dbw, __METHOD__ );
// The next batch should start at the ID following the last ID in the batch
$batchStart = end( $pageIds ) +1;
}
setBatchSize()
in the constructor of your maintenance script class to set the default batch size. This will automatically add a --batch-size
command line option, and you can use getBatchSize()
to get the batch size to use in your queries.Recoverability
Long running scripts may be interrupted for a number of reasons - a database server being shut down, the server running the script getting rebooted, exception because of data corruption, programming errors, etc. Because of this, it is important to provide a way to re-start the script's operation somewhere close to where it was interrupted.
Two things are needed for this: outputting the start of each batch, and providing a command line option for starting at a specific position.
Assuming we have defined a command line option called --start-from
, we can adjust the code above as follows:
$batchStart = $this->getOption( 'start-from', 0 );
//...
while ( true ) {
//...
// Do something for each page
$this->output( "Processing batch starting at $batchStart...\n" );
foreach ( $pageIds as $id ) {
//...
}
//...
}
$this->output( "Done.\n" );
This way, if the script gets interrupted, we can easily re-start it:
$ maintenance/run myscript
Processing batch starting at 0...
Processing batch starting at 1022...
Processing batch starting at 2706...
Processing batch starting at 3830...
^C
$ maintenance/run myscript --start-from 3830
Processing batch starting at 3830...
Processing batch starting at 5089...
Processing batch starting at 6263...
Done.
Note that this assumes that the script's operation is idempotent - that is, it doesn't matter if a few pages get processed multiple times.
tee
command. Also, to avoid interruption and loss of information when your SSH connection to the server fails, remember to run the script through screen
or tmux
.Sharding
If a script performs slow operations for each entries, it can be useful to run multiple instances of the script in parallel, using sharding.
The simplest way to implement sharding is based on the modulo of the ID used for patching:
We define a sharding factor (N) and a shard number (S) on the command line, we can define the shard condition as ID mod N = S
, with 0 <= S < N
.
All instances of the script that are to run parallel use the same sharding factor N, and a different shard number S.
Each script instance will only process IDs that match its shard condition.
The shard condition could be integrated into the database query, but that may interfere with the efficient use of indexes. Instead, we will implement sharding in code, and just multiply the batch factory accordingly. We can adjust the above code as follows:
$batchStart = $this->getOption( 'start-from', 0 );
$shardingFactor = $this->getOption( 'sharding-factor', 1 );
$shardNumber = $this->getOption( 'shard-number', 0 );
// ...
if ( $shardNumber >= $shardingFactor ) {
$this->fatalError( "Shard number ($shardNumber) must be less than the sharding factor ($shardingFactor)!\n" );
}
if ( $shardingFactor > 1 ) {
$this->output( "Starting run for shard $shardNumber/$shardingFactor\n" );
}
while ( true ) {
$pageIds = $dbw->newSelectQueryBuilder()
//...
// multiply the batch size by the sharding factor
->limit( $this->getBatchSize() * $shardingFactor )
->caller( __METHOD__ )
->fetchFieldValues();
// ...
// Do something for each page
foreach ( $pageIds as $id ) {
// process only the IDs matching the shard condition!
if ( $id % $shardingFactor !== $shardNumber ) {
continue;
}
$this->updatePage( $dbw, $id );
}
// ...
}
We can then start multiple instances of the script, operating on different shards
$ maintenance/run myscript --sharding-factor 3 --shard-number 0
Starting run for shard 0/3
Processing batch starting at 0...
^A1
$ maintenance/run myscript --sharding-factor 3 --shard-number 1
Starting run for shard 1/3
Processing batch starting at 0...
^A2
$ maintenance/run myscript --sharding-factor 3 --shard-number 2
Starting run for shard 2/3
Processing batch starting at 0...