Manual:Job queue/For developers

Jobs are non-urgent tasks. For a general introduction and management of job queues, see Manual:Job queue.

Deferred updatesEdit

Deferred updates (or deferrable updates) are a useful way to postpone time-consuming tasks in order to speed up the main MediaWiki response. Refer to DeferredUpdates class API and Database transactions for how to use these.

Deferred updates are represented as a callable functions that we queue in an array, and then calll at the end of the MediaWiki PHP process. Typically the call will take place after finishing the response to a web request (e.g. echo and flush everything to the browser), but before we actually exit or return to the web server. This is internally powered by fastcgi_finish_request() in MediaWiki::doPostOutputShutdown().

Deferrable updates are executed at the end of the current process. They are only memorised within that same web request (or other process, such as CLI maintenance scripts).

This unlike jobs which are scheduled via a persistent storage backend, to then run some minutes or hours in the future, independent of and after the original request that queued the job. The job queue in MediaWiki is a pluggable service. The default backend is to add jobs to the job table in the wiki's main database. The default job runner is to execute upto one job at the end of random page views.

More information:

Which one to use?Edit

Deferrable updates should be used for tasks that generally take only a few milliseconds to complete as a way to speed up the web response. By nature of being deferred, this means that failure is hidden from clients since the response has already been sent.

Examples of critical tasks that we don't run via deferred updates. Failure must be known to users, and more generally people should know how and when their action was completed, to then act further knowing that the change is completed. E.g. make further edits that depend on previous ones, possibly scripted or batched through some automation.

  • Database write that creates a page or saves an edit.
  • Create account, change password.
  • Explicit "send email" feature.

Examples of "urgent" tasks that we run via post-response deferred updates after saving an edit. These small transactions are expected to be reflected if the client looks for it afterward, but the result of these is not needed to render the response to the edit itself.

Examples of "non-urgent" tasks that we run via the job queue:

  • After saving an edit to a template, iterate through potentially millions of affected pages to re-parse and purge (known as "Refresh links" or LinksUpdate).
  • Periodically prune old rows from the recent changes table.
  • After uploading a photo, pre-render common thumbnail sizes.
  • After saving an edit to an article, send emails to the accounts that watch this page with email notifications enabled.

FallbackEdit

Deferrable updates can choose to implement the EnqueueableDataUpdate interface. Such updates can be automatically converted to a job as-needed. For example, if the update fails, MediaWiki will convert it to a job and queue it to try again later. There are also other situations in which we improve reliability or optimise throughput by proactively converting updates to jobs where possible.

Since any MediaWiki code can queue deferred updates, it is also possible for a CLI maintenance script or job to implicitly built up a list of deferred updates. If these batch operations end up queuing a lot of updates, MediaWiki will proactively convert tasks to jobs where possible (handled by the DeferredUpdates class internally).

Use jobs if you need to save data in the context of a GET requestEdit

For scalability and performance reasons, MediaWiki developers should generally not perform database writes during page views or other GET requests. If this becomes difficult to avoid, check the Backend performance guidelines first and consider seeking advice from other developers or the Performance Team for how to approach the problem in a different way.

Note that large wiki farms (such as Wikipedia) may operate from multiple data centers and thus run GET requests (which don't expect database writes) from a secondary data center, which should response to such request without relying on communicating to the primary DC.

If you're reasonably certain that your feature will only rarely discover during a GET request the need for a database write, and if the write is not urgent, then one option you do have is to queue a job during a GET request. Job queues can be buffered and synced across datacenters asynchronously and thus do not require immediate cross-DC communication. You can then rely on the job eventually being transmitted to the primary DC where it will then execute at some point in the future.

Deferred updates should not be used to perform database writes after a GET request. Attempting this will log a DBPerformance warning message.

Registering a jobEdit

To use the job queue to do your non-urgent jobs, you need to do these things:

Create a Job subclassEdit

You need to create a class, that, given parameters and a Title, will perform your deferred updates

<?php
class SynchroniseThreadArticleDataJob extends Job {
	public function __construct( $title, $params ) {
		// Replace synchroniseThreadArticleData with an identifier for your job.
		parent::__construct( 'synchroniseThreadArticleData', $title, $params );
	}

	/**
	 * Execute the job
	 *
	 * @return bool
	 */
	public function run() {
		// Load data from $this->params and $this->title
		$article = new Article( $this->title, 0 );
		$limit = $this->params['limit'];
		$cascade = $this->params['cascade'];

		// Perform your updates
		if ( $article ) {
			Threads::synchroniseArticleData( $article, $limit, $cascade );
		}

		return true;
	}
}

Add your Job class to the global listEdit

Add the Job class to the global $wgJobClasses array. In extensions, this is done in the extension.json file, and in core it's done in DefaultSettings.php. The key must be unique and match the value in the job's constructor, and the value is the class name.

How to queue a jobEdit

/**
 * 1. Set any job parameters you want to have available when your job runs
 *
 *    this can also be an empty array
 *    these values will be available to your job via $this->params['param_name']
 */
$jobParams = [ 'limit' => $limit, 'cascade' => true ];


/**
 * 2. Get the article title that the job will use when running
 *
 *    if you will not use the title to create/modify a new/existing page, you can use :
 *    
 *    a vague, dumby title
 *    Title::newMainPage();
 *
 *    a more specific title
 *    Title::newFromText( 'User:UserName/SynchroniseThreadArticleData' )
 *
 *    a very specific title that includes a unique identifier. this can be useful
 *    when you create several batch jobs with the same base title
 *    Title::newFromText(
 *        User->getName() . '/' .
 *        'MyExtension/' .
 *        'My Batch Job/' .
 *        uniqid(),
 *        NS_USER
 *    ),
 *    
 *    the idea is for the db to have a title reference that will be used by your
 *    job to create/update a title or for troubleshooting by having a title
 *    reference that is not vague
 */
$title = $article->getTitle();


/**
 * 3. Instantiate a Job object
 */
$job = new SynchroniseThreadArticleDataJob( $title, $jobParams );


/**
 * 4. Insert the job into the database
 *    note the differences in the mediawiki versions
 *
 *    for performance reasons, if you plan on inserting several jobs into the queue,
 *    it’s best to add them to a single array and then push them all at once into the queue
 *
 *    for example, earlier in your code you have built up an array of $jobs with different 
 *    titles and jobParams
 *
 *    $jobs[] = new SynchroniseThreadArticleDataJob( $title, $jobParams );
 *    JobQueueGroup::singleton()->push( $jobs );
 */
$job->insert();                           // mediawiki < 1.21
JobQueueGroup::singleton()->push( $job ); // mediawiki >= 1.21

There is another function to push jobs, JobQueueGroup::lazyPush(), which will be executed at the very end, hence after jobs pushed with JobQueueGroup::push().

OtherEdit

Job queue typeEdit

A job queue type is the command name you give to the parent::__construct() method of your job class; e.g., using the example above, that would be synchroniseThreadArticleData.

getQueueSizes()Edit

JobQueueGroup::singleton()->getQueueSizes() will return an array of all job queue types and their sizes.

Array
(
    [refreshLinks] => 1
    [refreshLinks2] => 3
    [synchroniseThreadArticleData] => 10
)

getSize()Edit

While getQueueSizes() is handy for analysing the entire job queue, for performance reasons, it’s best to use JobQueueGroup::singleton()->get( <job type> )->getSize() when analysing a specific job type, which will only return the job queue size of that specific job type.

Array
(
    [synchroniseThreadArticleData] => 100
)

InternalsEdit

Pushing jobsEdit

The primary function is JobQueueGroup::push(). It selects the job queue corresponding to the job type and, depending on the job queue implementation (database or Redis), it will be pushed either through a Redis connection (Redis case) either as a deferrable update (database case).

The lazy push function (JobQueueGroup::lazyPush()) keeps in memory the jobs. At the end of the current execution (end of MediaWiki request or end of the current job execution) the jobs kept in memory are pushed, as the last deferrable update (of type AutoCommitUpdate). As a deferrable update, the jobs are pushed at the end of the current execution, and as an AutoCommitUpdate the jobs are pushed as a single database transaction. See JobQueueGroup::lazyPush() and JobQueueGroup::pushLazyJobs() for details.

In CLI, note that deferrable updates (either from JobQueueGroup::push() (JobQueueDB implementation), either from JobQueueGroup::lazyPush()) are directly executed if the database transaction flag (LBFactory::hasTransactionRound()) is free. See DeferredUpdates::addUpdates() and DeferredUpdates::tryOpportunisticExecute() for details.

When some jobs are pushed through JobQueueGroup::lazyPush() but never really pushed (and hence lost), usually because an unhandled exception is thrown, the destructor of JobQueueGroup shows a warning in the debug log:

PHP Notice: JobQueueGroup::__destruct: 1 buffered job(s) never inserted

See task T100085 for an example of such a warning; this was before MediaWiki 1.29 release for Web-executed jobs, because when a job internally lazy-push a job and the former job is executed in the shutdown part of a MediaWiki request, the later job is not pushed (because JobQueueGroup::pushLazyJobs() was already called); the fix for this specific bug was to call JobQueueGroup::lazyPush() in JobRunner::executeJob() to always push lazily-pushed jobs after execution of each job.

Execution of jobsEdit

Jobs are ordinarily executed at the end of a web request, at the rate of $wgJobRunRate per request. If $wgJobRunRate == 0, no jobs are run at the end of a web request. The default value of $wgJobRunRate is 1.

All enqueued jobs can be executed at any time by running maintenance/runJobs.php. This is particularly important when $wgJobRunRate == 0.

The jobs are run by the JobRunner class. Each job is given its own database transaction.

At the end of the job execution, deferrable updates are executed. Since MediaWiki 1.28.3/1.29 lazily-pushed jobs are pushed through a deferrable update in order to use a dedicated database transaction (with AutoCommitUpdate).