Manual:External storage
External storage is an abstraction for storing the wiki's content (i.e. what would normally go into the text table) outside the normal database, possibly with some kind of compression applied. Some extensions (such as StructuredDiscussions ) can use external storage directly for storing other kinds of data.
The contents of external storage are addressed with an URL in the form <protocol>://<location>/<object name>
, with the protocol determining what type of storage should be used.
Pre-1.32 these URLs were stored in the old_text
field of the text
table, with old_flags
set to external
.
Since 1.32 they are stored in the content_address
field of the content table.
Advantages
editThe size of the text table is typically the biggest among all tables.
On wikis with millions of edits, the text
table can be several gigabytes in size.
Since the contents of the text
table are not mutable (edits to pages create new revisions and new entries to the text
table, but old entries can't be modified), storing the contents on a different database provides the following benefits:
- Split storage necessities - Instead of a big monolithic database, external storage can span several servers, that allow for easier migration and disk allocation.
- Database performance - The database server for external storage has very low memory and CPU requirements, since it's just a store and it doesn't need too much caching and it doesn't need to perform complex queries. This allows for the main database server to use all the memory available for caching of other tables that would profit from it.
- Backups - Backups of big databases take time. Backups of the external storage database can be done incrementally, since old entries aren't mutable. When an external storage database has grown sufficiently, a new database can be created for new external storage, and the old database can be put as read only and removed from routine backups (it still needs to be accessible for MediaWiki, though).
Code
editThe main class for interacting with external storage is ExternalStore.
You can use insert
or (more typically) insertToDefault
to store a piece of data and receive the URL at which it was stored; that URL can be used with fetchFromURL
to retrieve the data.
Internally, ExternalStore
interacts with the ExternalStoreMedium subclass corresponding with the protocol.
ExternalStoreDB
, which is the commonly used one, differs from the others in that it provides special handling when the stored data is a serialized HistoryBlob subclass; such objects can be retrieved with <protocol>://<location>/<object name>/<item id>
, in which case the store will unserialize the object and get the appropriate item (by calling getItem
on it).
In practice, you should avoid using ExternalStorage
directly most of the time, and use SqlBlobStore (or an even higher-level abstraction such as RevisionStore) instead.
Configuration
editAn example LocalSettings.php
setup:
$wgExternalStores = [ 'DB' ];
$wgExternalServers = [ 'demoCluster' => [
[ 'host' => 'primary.example.org', 'user' => 'userM', 'password' =>'pwdM', 'dbname' => 'dbM', 'type' => "mysql", 'load' => 1 ],
[ 'host' => 'replica1.example.org', 'user' => 'userS1', 'password' =>'pwdS1', 'dbname' => 'dbS1', 'type' => "mysql", 'load' => 1 ],
[ 'host' => 'replica2.example.org', 'user' => 'userS2', 'password' =>'pwdS2', 'dbname' => 'dbS2', 'type' => "mysql", 'load' => 1 ],
] ];
$wgDefaultExternalStore = [ 'DB://demoCluster' ];
- The
$wgExternalStores
line states that aDB
external store can be used. (TheDB
part is not an arbitrary name that can be adjusted. It has to beDB
.) This corresponds to theExternalStoreMedium
subclass used, and the protocol of the blob address. - The
$wgExternalServers
line states all the usable clusters with all usable nodes of a cluster. The top-level array's keys denote a cluster's name (The above example defines only one cluster. It has the namedemoCluster
). The value to those keys are again arrays. They hold the specifications of the individual nodes. The first node is consider the primary. All writes to the database are performed through this primary node. Zero or more replica nodes may follow. (In the above example, you find two replica nodes). Each node may have its ownhost
,user
,password
,dbname
, andtype
, as shown in the example. Theload
parameter allows to specify how much of the load should pass through this node. - The
$wgDefaultExternalStore
line holds those external stores that may be used for storage of new text. If you omit this line, the external store will be read-only and new texts will go into the default database (i.e.: the same database holding page, revision, image data; not the cluster).
For a multi-primary (formerly called multi-master) wiki farm setup (like Wikimedia), consider using LBFactory_Multi
instead.
Database setup
editFor the above configuration example, you would have to:
- Create the database
dbM
on the hostprimary.example.org
- Run the
maintenance/storage/blobs.sql
SQL-script on the databasedbM
on the hostprimary.example.org
. Do not usemaintenance/sql.php
for this task, as it will add the required tables to your default database (i.e.: the database holding page, revision, image data) and not todbM
. If you are not sure how to run the SQL-script on the databasedbM
on the hostprimary.example.org
, please consult your database documentation. - Set up replication (consult your database's documentation on how to set up replication) towards
dbS1
on the hostreplica1.example.org
, and - Set up replication towards
dbS2
on the hostreplica2.example.org
.
Maintenance scripts
editThere are several maintenance scripts for moving content to the external store:
- moveToExternal.php - move old revisions to external storage
- compressOld.php - compress old revisions and potentially move them to external storage
- recompressTracked.php - move revisions (or other data) from one external storage to another and recompress them in the process
- refreshImageMetadata.php - when used with --force and
$wgLocalFileRepo
has been configured with'useSplitMetadata' => true
.