Extension:External Data/bn

This page is a translated version of the page Extension:External Data and the translation is 1% complete.
Other languages:
মিডিয়াউইকি এক্সটেনশন ম্যানুয়াল
OOjs UI icon advanced-invert.svg
External Data
মুক্তির অবস্থা: স্থিতিশীল
বাস্তবায়ন পার্সার ফাংশন , বিশেষ পৃষ্ঠা
বিবরণ Allows for using and displaying values retrieved from various sources: external URLs, local wiki pages and local files (in CSV, GFF, JSON, XML, HTML or other formats), database tables, and LDAP servers.
লেখক(গণ) Yaron Koren, Alex Mashin and others
সর্বশেষ সংস্করণ 2.4.1 (জুন, ২০২১)
সমর্থন নীতি Master maintains backward compatibility.
MediaWiki 1.28+
ডাটাবেজ পরিবর্তন হ্যাঁ
লাইসেন্স গ্নু জেনারেল পাবলিক লাইসেন্স ২.০ অথবা পরবর্তী
ডাউনলোড See here
উদাহরণ A page containing information retrieved from an external CSV file
  • $edgDBTablePrefix
  • $edgExeCommand
  • $edgCacheExpireTime
  • $edgSecrets
  • $edgExeEnvironment
  • $edgExeUrl
  • $edgHTTPOptions
  • $edgAllowSSL
  • $edgExeLimits
  • $edgDBPass
  • $edgDBServer
  • $edgDBServerType
  • $edgDirectoryPath
  • $edgLDAPUser
  • $edgStringReplacements
  • $edgTryEncodings
  • $edgExeUseStaleCache
  • $edgExternalValueVerbose
  • $edgParsers
  • $edgConnectors
  • $edgExeName
  • $edgExeTags
  • $edgExeVersionCommand
  • $edgExeTempFile
  • $edgExeVersion
  • $edgDBFlags
  • $edgExeCacheSeconds
  • $edgLDAPPass
  • $edgDBUser
  • $edgDBPrepared
  • $edgDBDirectory
  • $edgExePreprocess
  • $edgExeIgnoreWarnings
  • $edgDBName
  • $edgAlwaysAllowStaleCache
  • $edgExeInput
  • $edgExeParamFilters
  • $edgExeParams
  • $edgExePostprocess
  • $edgLDAPServer
  • $edgFilePath
  • $edgDBTypes
  • $edgLDAPBaseDN
Translate the External Data extension if it is available at translatewiki.net

Check usage and version matrix.

Issues Open tasks · বাগ প্রতিবেদন

The External Data extension allows MediaWiki pages to retrieve, filter, and format structured data from one or more sources. These sources can include external URLs, regular wiki pages, uploaded files, files on the local server, databases and LDAP directories.

The extension has the following Parser functions :

  • #get_web_data - retrieves CSV, GFF, JSON, XML, HTML or free-form data from a URL and assigns it to variables that can be accessed on the page.
  • #get_soap_data - retrieves data from a URL via the SOAP protocol.
  • #get_file_data - retrieves data from a file on the local server, in the same formats as #get_web_data.
  • #get_db_data - retrieves data from a database.
  • #get_ldap_data - retrieves data from an LDAP server.
  • #get_program_data - retrieves data returned by a program run server-side.
  • #external_value - displays the value of any such variable.
  • #for_external_table - cycles through all the values retrieved for a set of variables, displaying the same "container" text for each one.
  • #store_external_table - cycles through a table of values, storing them as semantic data via the Semantic MediaWiki extension, by mimicking a call to SMW's #subobject function for each row.
  • #display_external_table - cycles through all the values retrieved for a set of variables, displaying each "row" using a template.
  • #clear_external_data - erases the current set of retrieved data.

one tag:

  • ‎<externalvalue data="external variable">Fallback placeholder‎</externalvalue> that shows raw external data without any wiki postprocessing.

and six Lua functions:

  • mw.ext.externalData.getWebData
  • mw.ext.externalData.getFileData
  • mw.ext.externalData.getDbData
  • mw.ext.externalData.getSoapData
  • mw.ext.externalData.getLdapData
  • mw.ext.externalData.getProgramData


You can download the External Data code, in .zip format, here.

You can also download the code directly via Git from the MediaWiki source code repository. From a command line, you can call the following:

git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/ExternalData

You can also view the code online here.


To install this extension, create an 'ExternalData' directory (either by extracting a compressed file or downloading via Git), and place this directory within the main MediaWiki 'extensions' directory. Run composer update in that directory. Then, in the file 'LocalSettings.php', add the following line:

wfLoadExtension( 'ExternalData' );


External Data was created, and is maintained, by Yaron Koren (reachable at yaron57@gmail.com). The overall code base, though, is the work of many people. Alexander Mashin has contributed significantly to the code. Important code contributions have also been made by Michael Dale, David Macdonald, Siebrand Mazeland, Ryan Lane, Chris Wolcott, Jelle Scholtalbers, Kostis Anagnostopoulos, Nick Lindridge, Dan Bolser, Joel Natividad, Scott Linder, Cindy Cicalese, Umherirrender, Anysite, Sahaj Khandelwal and others.

Development of some features was funded by KeyGene and KDZ – Zentrum für Verwaltungsforschung.

Retrieving data

Data can be retrieved from three different sources: from a web page containing structured data (including a page on the wiki itself), from a database, and from an LDAP server.

#get_web_data - CSV, GFF, JSON, XML, HTML

To get data from a web page that holds structured data, call the parser function #get_web_data. It can take the following syntax:

 url=data source URL
 |format={CSV|CSV with header|GFF|JSON|XML|HTML|text}
 |regex=regular expression
 |data=local_variable_name1=external_variable_name1, etc.
 |filters=external_variable_name1=filter_value1, etc.
 |use xpath
 |use jsonpath
 |json offset=number of characters
 |post data=additional data
 |cache seconds=number of seconds
 |use stale cache
 |suppress error

An explanation of the parameters:

  • |url= - sets the full URL of the file being retrieved.
  • |format= - specifies the format of the data being retrieved: it should be one of either 'CSV', 'CSV with header', 'GFF', 'JSON', 'XML', 'HTML' or 'text'. CSV, JSON and XML are standard data formats; GFF, or the Generic Feature Format, is a format for genomic data. The difference between 'CSV' and 'CSV with header' is that 'CSV' is simply a set of lines with values; while in 'CSV with header', the first line is a "header", holding a comma-separated list of the name of each column. 'text' indicates that the contents of the file should be retrieved as-is.
  • |delimiter= - specifies the delimiter between values in the data set; it is used only for the CSV formats. The default value is ",". To specify a tab delimiter, use "\t".
  • |regex= - specifies a PHP regular expression that should be used to get specific strings; used with the "text" format. Example: For sample text <h1>Heading</h1>, the regex |regex=/<h1>(?'matched'.*)<\/h1>/ returns "Heading" to the external variable matched.
  • |data= - holds the "mappings" that connect local variable names to external variable names. Each mapping (of the form local_variable_name=external_variable_name) is separated by a comma. External variable names are the names of the values in the file (in the case of a header-less CSV file, the names are simply the indexes of the values: 1, 2, 3, etc.), and local variable names are the names that are later passed in to #external_value.
  • |filters= - sets filtering on the set of rows being returned. You can set any number of filters, separated by commas; each filter sets a specific value for a specific external variable. It is not necessary to use any filters; most APIs, it is expected, will provide their own filtering ability through the URL's query string.
  • |use xpath= - an optional parameter that can be used with the "XML" or "HTML" formats, to indicate that "data" mappings should be done using XPath notation; see Using XPath, below.
  • |default xmlns prefix= - an optional parameter that can be used with "use xpath", which sets the default namespace prefix to be used.
  • |use jsonpath= - an optional parameter that can be used with the "JSON" format, to indicate that "data" mappings should be done using JSONPath notation; see Using JSONPath, below.
  • |json offset= - an optional parameter that represents the number of characters to ignore at the beginning of the data set being parsed. It is used with JSON values, in case the JSON being accessed has some kind of security string at the beginning.
  • |post data= - an optional parameter that lets you send some set of data to the URL via POST, instead of via the query string.
  • |cache seconds= - an optional parameter that sets the number of seconds that the values from this call should be cached; if it is less than $edgCacheExpireTime, if there is any, the latter will apply; and if the effective cache expiration time is zero, caching is forbidden.
  • |use stale cache= - an optional parameter that allows this function to use an expired cache entry if it cannot retrieve the real data.
  • |suppress error= - an optional parameter that prevents any error message from getting displayed if there is a problem retrieving the data.

More than one #get_web_data call can be used in a page. If this happens, though, make sure that every local variable name is unique.

For data from XML sources, the variable names are determined by both tag and attribute names. For example, given the following XML text:

<fruit type="Apple"><color>red</color></fruit>

the variable type would have the value Apple, and the variable color would have the value red.

Similarly, the following XML text would be interpreted as a table of values defining two variables named type and color:

  <fruit type="Apple"><color>red</color></fruit>
  <fruit type="Kiwi"><color>brown</color></fruit>

A CSV file must be literally a CSV file, i.e., delimited by commas. A call for a headerless CSV file might look:

{{#get_web_data:url=http://mydata.com|format=CSV|data=first name=1}}

while a call to CSV with a header row might look like:

{{#get_web_data:url=http://mydata.com|format=CSV with header|data=first name=FirstName}}

where the header contains FirstName, which is retrieved as first name in the wiki.

You can also set caching to be done on the data retrieved, and string replacement to hide API keys; see the "Usage" section, below, for how to do both of those.

Getting data from a non-API text file

If the data you wish to access is on a MediaWiki page or in an uploaded file, you can use the above methods to retrieve the data assuming the page or file only contains data in one of the supported formats:

  • for data on a wiki page, use "&action=raw" as part of the URL;
  • for data in an uploaded file, use the full path.

If the MediaWiki page with the data is on the same wiki, it is best to use the fullurl: parser function, e.g.

  • {{fullurl:Test/test.csv|action=raw}}

Similarly, for uploaded files, you can use the filepath: function, e.g.

  • {{filepath:xyzzy.csv}}

For wiki pages that have additional information, the External Data extension provides a way to create an API of your own, at least for CSV data. To get this working, first place the data you want accessed in its own wiki page, in CSV format, with the headers as the top row of data (see here for an example). Then, the special page 'GetData' will provide an "instant API" for accessing either certain rows of that data, or the entire table. By adding "field-name=value" to the URL, you can limit the set of rows returned.

A URL for the 'GetData' page can then be used in a call to #get_web_data, just as any other data URL would be; the data will be returned as a CSV file with a header row, so the 'format' parameter of #get_web_data should be set to 'CSV with header'. See here for an example of such data being retrieved and displayed using #get_web_data and #for_external_table. In this way, you can use any table-based data within your wiki without the need for custom programming.

Data caching

You can configure External Data to cache the data contained in the URLs that it accesses, both to speed up retrieval of values and to reduce the load on the system whose data is being accessed. To do this, you can run the SQL contained in the extension file 'ExternalData.sql' in your database, which will create the table 'ed_url_cache', then add the following to your LocalSettings.php file, after the inclusion of External Data:

$edgCacheTable = 'ed_url_cache';
The required table is not being created automatically when running MediaWiki's "update.php" script, so you will have to do this manually. Also remember to add your table prefix to the code provided with the "ExternalData.sql" if you are using prefixes, e.g. change
CREATE TABLE IF NOT EXISTS `YourPrefixed_url_cache`
where "YourPrefix" is being replace by the prefix you specified with the $wgDBprefix setting.

You should also add a line like the following, to set the expiration time of the cache, in seconds; this example line will cache the data for a week:

$edgCacheExpireTime = 7 * 24 * 60 * 60;

By default, if data cannot be retrieved, and a cache table exists, #get_web_data will use the cached value for this data even if the cache has already expired. To disallow this, add the following to LocalSettings.php:

$edgAlwaysAllowStaleCache = false;

String replacement in URLs

One or more of the URLs you use may contain a string that you would prefer to keep secret, like an API key. If that's the case, you can use the array $edgStringReplacements to specify a dummy string you can use in its place. For instance, let's say you want to access the URL "http://worlddata.com/api?country=Guatemala&key=123abcd", but you don't want anyone to know your API key. You can add the following to your LocalSettings.php file, after the inclusion of External Data:

$edgStringReplacements['WORLDDATA_KEY'] = '123abcd';

Then, in your call to #get_web_data, you can replace the real URL with: "http://worlddata.com/api?country=Guatemala&key=WORLDDATA_KEY".

Whitelist for URLs

You can create a "whitelist" for URLs accessed by #get_web_data: in other words, a list of domains, that only URLs from those domains can be accessed. If you are using string replacements in order to hide secret keys, it is highly recommended that you create such a whitelist, in order to prevent users from finding out those keys by including them in a URL within a domain that they control.

To create a whitelist with one domain, add the following to LocalSettings.php:

$edgAllowExternalDataFrom = 'http://example.org';

To create a whitelist with multiple domains, add something like the following instead:

$edgAllowExternalDataFrom = array('http://example.org', 'http://example2.com');

HTTP options

By default, #get_web_data allows for HTTPS-based wikis to access plain HTTP URLs, and vice versa, without the need for certificates (see Transport Layer Security on Wikipedia for a full explanation). If you want to require the presence of a certificate, add the following to LocalSettings.php:

$edgAllowSSL = false;

Additionally, the global variable $edgHTTPOptions lets you set a number of other HTTP-related settings. It is an array that can take in any of the following keys:

  • timeout - how many seconds to wait for a response from the server (default is 'default', which corresponds to the value of $wgHTTPTimeout, which by default is 25)
  • sslVerifyCert - whether to verify the SSL certificate, if retrieving an HTTPS URL (default is false)
  • followRedirects - whether to retrieve another URL if the specified URL redirects to it (default is false)

So, for instance, if you want to verify the SSL certificate of any URL being accessed by #get_web_data, you would add the following to LocalSettings.php:

$edgHTTPOptions['sslVerifyCert'] = true;

Using XPath

In some cases, the same tag or attribute name can be used more than once in an XML or HTML file, and you only want to get a specific instance of it. You can do that using the XPath notation. To do it, you just need to add the parameter "use xpath", and then have each "external variable name" in the "data=" parameter be in XPath notation, instead of just a simple name.

We won't get into the details of XPath notation here, but you can see a demonstration of "use xpath" here.

Note that, when using XPath, any specific call to #external_value seems to only work once on a page. If you need to display a specific value more than once, you can use the Variables extension to store that value and then retrieve it again later in the page.

Using JSONPath

Just as with XML (see the section above), in JSON, specifying which data you want can require more than simply specifying an attribute or tag name. Thankfully, just as XML has XPath, JSON has JSONPath: JSONPath is less well-known but just as useful. See here for one guide to JSONPath syntax, and here for an online evaluator of JSONPath syntax.

To use JSONPath, just add the parameter "use xpath" to the parser function call, and then have each "external variable name" in the "data=" parameter be in JSONPath notation.

Using CSS-style selectors

With the "HTML" format, you can either use XPath (see above) or CSS-style selectors. For CSS-style selection, you do not need to specify a special parameter: it is the default approach used when "use xpath" is not specified. CSS selectors are a notation that uses tag names, classes and IDs to locate one or more elements in an HTML page; it is also the syntax used in jQuery. See here for one reference for CSS-style selectors.

#get_soap_data - web data via SOAP

The parser function #get_soap_data, similarly to #get_web_data, lets you get data from a URL, but here using the SOAP protocol. It is called in the following way:

 url=data source URL
 |request=the function used to request data
 |requestData=parameter1=value1, etc.
 |response=the function used to retrieve data
 |data=local_variable_name1=external_variable_name1, etc.
 |suppress error

All of the LocalSettings.php settings that can be applied for #get_web_data can also be applied for #get_soap_data: $edgCacheTable, $edgCacheExpireTime, $edgStringReplacements, $edgAllowExternalDataFrom and $edgAllowSSL.

#get_file_data - retrieve files on the local server

You can get data from a file on the server on which the wiki resides, using #get_file_data. This parser function is called in a similar manner to #get_web_data - the set of allowed formats is the same, as are most of the other parameters. Unlike with #get_web_data, however, you cannot retrieve the data from any file; rather, the set of allowed files, and/or directories, must be set beforehand in LocalSettings.php, with an alias for each one, so that the actual file paths remain private. It is called in the following way:

  file=''file ID''
  |directory=''directory ID''
  |file name=''file name''
  |format={CSV|CSV with header|GFF|JSON|XML|HTML|text}
  |data=''local_variable_name1''=''external_variable_name1'', etc.
  |filters=''external_variable_name1''=''filter_value1'', etc.
  |use xpath
  |regex=''regular expression''
  |cache seconds=''number of seconds''
  |suppress error

Either "file=", or the combination of "directory=" and "file name=", should be set, but not both. If you want to give the wiki access to one or a small number of files, you could add one or more lines like the following to LocalSettings.php:

$edgFilePath['ID'] = "local file path";

You would then set "file=" to the ID for that file.

And if there are any directories that you want the wiki to be able to access all files from, you could add one or more lines like the following to LocalSettings.php:

$edgDirectoryPath['ID'] = "local directory path";

You would then set "directory=" to the ID of that directory, and "file name=" to the name of the file you want to access in this #get_file_data call. Note that the External Data code ensures that users cannot do tricks like adding "../.." and so on to the file name to access directories outside of the specified one.

To give an example, let's say that a lab wants to publish test results on their wiki. The results are all in CSV files in one directory on a server. So, they might add the following to LocalSettings.php:

$edgDirectoryPath['Our test results'] = '/home/genomelab/data/TestResults/;

Then, a #get_file_data call on the wiki might look like this:

 directory=Our test results
 |file name=JanuaryData.csv
 |data=Test date=Date,Study size=Study size,Researchers=Researchers,Result details=Notes

Below that, there would presumably be a call to #for_external_table or #display_external_table to display the resulting data.

Is is also possible to process all files, optionally, with names matching a mask, in a directory. Example:

 {{#get_file_data: directory = classes
  | file name = *.php
  | format = text
  | regex = /^\s*(?<abstract>abstract)?\s*class\s+(?<class>\w+)(\s+extends\s*(?<extends>\w+))?/m
  | data = file=__file,abstract=abstract,class=class,base=extends

will produce a table of PHP classes with their parents in this extension, provided that LocalSettings.php contains $edgDirectoryPath ['classes'] = "$wgExtensionDirectory/ExternalData/includes";. File name, relative to $edgDirectoryPath ['classes'], will be saved to the external variable __file.

#get_db_data - retrieve data from a database

The parser function #get_db_data allows retrieval of data from external databases. This function executes a simple SELECT statement and assigns the results to local variables that can then be used with the #external_value or #for_external_table functions.

A note about security: - If you are going to use #get_db_data you should think about the security implications. Configuring a database in LocalSettings.php will allow anyone with edit access to your wiki to run arbitrary SQL statements against that database. You should use a database user that has the minimum permissions for what you are trying to achieve. It is possible that complex SQL constructions could be passed to this function to cause it to do things vastly different from what it was designed for.


Each database being accessed needs to be configured separately in LocalSettings.php. For normal databases (i.e., everything except for SQLite), add the following stanza for each database:

$edgDBServer['ID'] = "server URL";
$edgDBServerType['ID'] = "DB type";
$edgDBName['ID'] = "DB name";
$edgDBUser['ID'] = "username";
$edgDBPass['ID'] = "password";


  • ID is a label for this database which is used when calling #get_db_data
  • server URL is the hostname on which the database lives
  • DB type is the type of database, i.e. mysql, postgres, mssql, oracle, sqlite, db2 or mongodb
  • DB name, username and password are details for accessing the database.

An example of a set of values would be:

$edgDBServer['employee-db'] = "";
$edgDBServerType['employee-db'] = "mysql";
$edgDBName['employee-db'] = "employeesDatabase";
$edgDBUser['employee-db'] = "guest";
$edgDBPass['employee-db'] = "p@ssw0rd";

The following optional settings can also be added:

$edgDBFlags['id'] = "MediaWiki DB flags";
$edgDBTablePrefix['id'] = "table prefix";

Example values for these variables are:

$edgDBFlags['employee-db'] = DBO_NOBUFFER & DBO_TRX;
$edgDBTablePrefix['employee-db'] = "emp_";

Support for database systems

MySQL, Postgres (i.e. PostgreSQL), DB2 and MongoDB should work fully by default (though there are syntax limitations, and differences, for MongoDB - see below). For MS SQL/SQLServer, SQLite and Oracle, you may need to perform some special handling.


If you cannot connect to a PostgreSQL database, it may be because your PHP installation is lacking the PostgreSQL database module, php-pgsql. On many Linux systems, you can install it by calling the following, then restarting the web server:

yum install php55-php-pgsql

Amend the above configuration in LocalSettings.php to change the server type to "postgres":

$edgDBServerType['ID'] = "postgres";


To connect to SQLite, you need something like the following in LocalSettings.php:

$edgDBServerType['ID'] = "sqlite";
$edgDBDirectory['ID'] = "/directory/to/DB/file";
$edgDBName['ID'] = "Name of file, without .sqlite";

Connecting to Oracle may work by default. If it doesn't work, the following may help:

  • Make sure that the Oracle client, and the PHP version being used, are using the same architecture: they have to either both be 32-bit, or both be 64-bit.
  • Make sure that the value of $edgDBServer for the installation matches something in the corresponding Oracle client .ora files. The value may need to look like "serverName/dbName", as opposed to "serverName".
  • If none of the above are the issue, you could try using the OdbcDatabase extension, which should work as well.

For MongoDB, there are no special connection parameters, although the username and password may be optional. There are two optional query parameters: aggregate and find query. Under PHP 7.*, the extension mongodb and library mongodb (composer require mongodb/mongodb) is required. Unfortunately, due to the way that MediaWiki continuous integration is built, this library cannot be simply added to composer.json for this extension (see T259743).

MongoDB is a non-SQL (or "NoSQL", if you prefer) database system, with its own querying language. When accessing MongoDB, you can either pass in a standard MongoDB query, or use the standard SQL-like syntax of #get_db_data. To use standard MongoDB querying, pass the query to the parameter |find query= or |aggregate=.

You can also use the standard querying functionality. There are some restrictions and differences, however, for the "where" clause:

  • only "AND"s can be used, not "OR"s
  • for the "LIKE" comparison, no text should be placed around the comparator - it should look like "Username LIKE Jo", not "Username LIKE '%Jo%'".

Because MongoDB returns values in JSON that may be complex, and contain compound values, you can get data that is stored in such a way by separating field names with dots. For instance, if the return data contains a value for a field called "Measurements" that is an array, holding values for fields called "Height" and "Width", then the "data=" parameter to #get_web_data could have a value like "height=Measurements.Height,width=Measurements.Width".

You can do Memcached-based caching of values retrieved from MongoDB; to do that, you need the following two lines in LocalSettings.php:

$wgMainCacheType = CACHE_MEMCACHED;
$edgMemCachedMongoDBSeconds = ''number of seconds'';

To enable ModgoDB under PHP 7.4, mongodb extension should be enabled (sudo apt install php-mongodb && sudo phpenmod mongodb; and also mongodb library should be installed with Composer: composer require mongodb/mongodb "^1.6.0" (this will be necessary until bug T259743 is resolved).


To get data from an external database, call the following:

 db=database ID
 |from=from clause
 |where=where clause
 |order by=order by clause
 |group by=group by clause
 |data=data mappings
 |suppress error

An explanation of the fields:

  • db= - the identifying label configured in LocalSettings.php
  • from= - an SQL "FROM" clause, i.e. one or more tables - can be as simple as tableName or as complex as tableName1 = alias1, tableName2 = alias2, etc.
  • join on= - corresponds to an SQL "JOIN ... ON" clause; used if there is more than one table being queried. An example value would be tableName1.ID = tableName2.id_field, etc.
  • where= - an SQL "WHERE" clause (optional)
  • limit= - an SQL "LIMIT" clause, i.e. a number, limiting the number of results (optional)
  • order by - an SQL "ORDER BY" clause (optional)
  • data= - mapping of database column names to local variables (syntax: localVariable=databaseColumn - i.e. "employeeName" is the name of the database column in the example below).
  • suppress error - prevents any error message from getting displayed if there is a problem retrieving the data (optional)

An example call, using the "employee database" example from above:

 |order by=employeeName ASC

Prepared statements

A safer approach is to define one or more prepared statements for the database connections defined in LocalSettings.php, in $edgDBPrepared ['(db id)'] configuration variable, which can be a string, containing a SQL query with parameters, for the only statement, or an associative array ([ 'query id' => 'SQL' ];), for several.

Parameters to the prepared statement are passed as a comma-separated list in parser function argument parameters. If there are several prepared statements defined for the same connection, the needed statement ID is passed as query parameter. If prepared statements are defined, arbitrary queries will not be created for the same connection.


  • Only one statement allowed for the connection:
$edgDBServer	['rfam2']	= 'mysql-rfam-public.ebi.ac.uk:4497';
$edgDBServerType['rfam2']	= 'mysql';
$edgDBName		['rfam2']	= 'Rfam';
$edgDBUser		['rfam2']	= 'rfamro';
$edgDBPass		['rfam2']	= '';
$edgDBPrepared	['rfam2']	= <<<'SQL'
SELECT fr.rfam_acc, fr.rfamseq_acc, fr.seq_start, fr.seq_end
FROM full_region fr, rfamseq rf, taxonomy tx
WHERE rf.ncbi_id = tx.ncbi_id
AND fr.rfamseq_acc = rf.rfamseq_acc
AND tx.ncbi_id = ? -- the only parameter to the query.
AND is_significant = 1 -- exclude low-scoring matches from the same clan
{{#get_db_data:db = rfam2
 | parameters=10116 <!-- this parameter is used to substitute question marks in the prepared statement -->
 | data=account=rfam_acc,sec=rfamseq_acc,start=seq_start,end=seq_end
  • Several statements per connection:
$edgDBServer	['rfam3']	= 'mysql-rfam-public.ebi.ac.uk:4497';
$edgDBServerType['rfam3']	= 'mysql';
$edgDBName		['rfam3']	= 'Rfam';
$edgDBUser		['rfam3']	= 'rfamro';
$edgDBPass		['rfam3']	= '';
$edgDBPrepared	['rfam3']	= [
	'sequences' => <<<'SEQ'
SELECT fr.rfam_acc, fr.rfamseq_acc, fr.seq_start, fr.seq_end
FROM full_region fr, rfamseq rf, taxonomy tx
WHERE rf.ncbi_id = tx.ncbi_id
AND fr.rfamseq_acc = rf.rfamseq_acc
AND tx.ncbi_id = ? -- the only parameter to the query.
AND is_significant = 1 -- exclude low-scoring matches from the same clan
	'sno' => <<<'SNO'
SELECT fr.rfam_acc, fr.rfamseq_acc, fr.seq_start, fr.seq_end, f.type
FROM full_region fr, rfamseq rf, taxonomy tx, family f
rf.ncbi_id = tx.ncbi_id
AND f.rfam_acc = fr.rfam_acc
AND fr.rfamseq_acc = rf.rfamseq_acc
AND tx.tax_string LIKE ? -- the only parameter to the query.
AND f.type LIKE '%snoRNA%'
AND is_significant = 1 -- exclude low-scoring matches from the same clan
{{#get_db_data: db = rfam3
 | query=sequences <!-- this parameter is used to choose one of the prepared statements -->
 | parameters=10116 <!-- this parameter is used to substitute question marks in prepared statement -->
 | data=account=rfam_acc,sec=rfamseq_acc,start=seq_start,end=seq_end

#get_ldap_data - retrieve data from LDAP directory

The parser function #get_ldap_data allows retrieval of data from external LDAP directories. This function executes LDAP queries and assigns the results to local variables that can then be used with the #external_value function.

A note about security: - If you are going to use #get_ldap_data you should think hard about the security implications. Configuring an LDAP server in LocalSettings.php will allow anyone with edit access to your wiki to run queries against that server. You should use a domain user that has the minimum permissions for what you are trying to achieve. Wiki users could run queries to extract all sorts of information about your domain. You should know what you are doing before enabling this function.


The PHP extension ldap must be enabled. You need to configure each LDAP server in LocalSettings.php. Add the following stanza for each server:

$edgLDAPServer['domain'] = "myldapserver.com";
$edgLDAPBaseDN['domain'] = "[basedn]";
$edgLDAPUser['domain'] = "myDomainUser";
$edgLDAPPass['domain'] = "myDomainPassword";


  • domain is a label to be used when calling #get_ldap_data
  • myDomainuser and myDomainPassword are credentials used to bind to the LDAP server
  • [basedn] is the base DN used for the search.


$edgLDAPServer['foobar'] = "foobar.com";
$edgLDAPBaseDN['foobar'] = "OU=Users,dc=foobar,dc=com";
$edgLDAPUser['foobar'] = "u12345";
$edgLDAPPass['foobar'] = "mypassword";


To query the LDAP server, add this call to a wiki page:

 |filter=LDAP filter
 |data=data mappings


  • domain is the label used in LocalSettings.php
  • filter is the LDAP filter used for the search
  • data is the mappings of LDAP attributes to local variables
  • if all is not added, the query will retrieve only one result.

An example that retrieves a user from with Win2003/AD, using a userid passed to a template:


#get_program_data - retrieve data returned by a program run server-side

The parser function #get_ldap_data allows retrieval of data returned by a program run server-side. Every such program has to be confgured at LocalSettings.php as in the example below:

// 'lilypond' will be the name under which this program can be invoked by {{#get_program_data:}}.
$edgExeName             ['lilypond']    = 'LilyPond'; // (optional) the name of the program for Special:Version.
$edgExeUrl              ['lilypond']    = 'http://lilypond.org/'; // (optional) program home page for Special:Version.
$edgExeVersionCommand   ['lilypond']    = 'lilypond -v'; // (optional) Shell comand that will return program version for Special:Version. The results are cached.
$edgExeVersion          ['lilypond']    = 'GNU LilyPond 2.20.0'; // (optional) Explicitly set program version for Special:Version. Use only if $edgExeVersionCommand is not an option.
$edgExeLimits           ['lilypond']    = [ 'memory' => 0, 'time' => 0, 'walltime' => 0, 'filesize' => 0 ]; // (optional) Limits override for the program. Use with caution.
$edgExeEnvironment      ['lilypond']    = ['KEY' => 'value']; // (optional) Environment variables for the program. Parameters will be substitued in the values as well as in the shell command itself.
$edgExeCommand          ['lilypond']    = 'lilypond -dbackend=svg -dpaper-size="$size$" -dsafe -dcrop -o $tmp$ -'; // The shell command that receives user input as stdin and outputs its result as stdout (or into a temporary file). $size$ will be replaced with the value of the size argument to the parser function.
$edgExeParams           ['lilypond']    = ['size' => 'a4']; // Parameters to the parser function with their default values. If there is a numeric key, then the value is the name of a required parameter.
$edgExeParamFilters     ['lilypond']    = ['size' => '/^\w+$/']; // Callables and regular expressions that are used to validate parameter values. Should be as restrictive as possible.
$edgExeInput            ['lilypond']    = 'notes'; // Name of the parser function parameter that will be sent to program's standard input.
$edgExePreprocess       ['lilypond']    = null; // (optional) A callable used to preprocess the standard input for the program. There is one pre-defined function: EDConnectorExe::wikilinks4dot().
$edgExeTempFile         ['lilypond']    = '$tmp$.cropped.svg'; // (optional) Name of the temporary file used instead of standard output (not recommended).
$edgExeIgnoreWarnings   ['lilypond']    = true; // (optional) Ignore warnings that a program may send to stderr.
$edgExePostprocess      ['lilypond']    = null; // (optional) A callable used to postprocess programs' standrd output. There is one pre-defined function EDConnectorExe::innerXML().
$edgExeTags             ['lilypond']    = 'score'; // (optional) Bind the program to this tag to emulate the behaviour of some MediaWiki extensions.

After the program is configured so, it can be invoked thus:

{{#get_program_data: program = lilypond
 | size = a8landscape
 | notes = \paper {
  indent = 0\mm
  line-width = 110\mm
  oddHeaderMarkup = ""
  evenHeaderMarkup = ""
  oddFooterMarkup = ""
  evenFooterMarkup = ""
 \relative c' { f d f a d f e d cis a cis e a g f e }
 | format = text
 | data = xml=__text

and then the retrieved data (SVG in this case) can be shown with ‎<externalvalue data="external variable">Fallback placeholder‎</externalvalue>, which will prevent any wiki postprocessing.

A simplified syntax is availble in tag emulation mode: ‎<score size="a5">\paper {
indent = 0\mm
line-width = 110\mm
oddHeaderMarkup = ""
evenHeaderMarkup = ""
oddFooterMarkup = ""
evenFooterMarkup = ""
\relative c' { f d f a d f e d cis a cis e a g f e e d c}‎</score>

A simpler example, involving only text processing, is below:

// apt-cache show
$edgExeCommand      ['apt-cache show']  = 'apt-cache show $package$';
$edgExeParams       ['apt-cache show']  = ['package'];
$edgExeParamFilters ['apt-cache show']  = ['package' => '/^[\w-]+$/'];


    program = apt-cache show
  | package = graphviz-doc
  | data = key=1,value=2
  | format = CSV
  | delimiter = :
 {| class="wikitable"
 ! Key !! Value {{#for_external_table:<nowiki/>
 {{!}} {{{key}}} {{!}}{{!}} {{{value}}}

Although programs are run in a restricted environment by Shell::command(), wiki admin should exercise great caution while configuring programs to make them callable with #get_program_data.

Program's output is cached in the table ed_url_cache as configured by the parser function parameters:

  • |cache seconds=number of seconds,
  • |use stale cache,

and configuration settings:

$edgCacheTable = ...;
$edgCacheExpireTime = ...;
$edgAlwaysAllowStaleCache = ...;

A set of tested examples can be found here and (with working output) here.

Displaying data

Once you have retrieved the data onto the page, from any source, there are two ways to display it on the page: #external_value and #for_external_table.

Displaying individual values

If this call retrieved a single value for each variable specified, you can call the following:

{{#external_value:local variable name}}

As an example, this page contains the following text:

    |format=csv with header|data=bordered countries=Borders,population=Population,area=Area,capital=Capital}}

* Germany borders the following countries:
{{#arraymap:{{#external_value:bordered countries}}|,|x|[[x]]}}.
* Germany has population {{#external_value:population}}.
* Germany has area {{#external_value:area}}.
* Its capital is {{#external_value:capital}}.

The page gets data from this URL, which contains the following text:

"357,050 km²","Austria,Belgium,Czech Republic,Denmark,France,Luxembourg,Netherlands,Poland,Switzerland",Berlin,"82,411,001"

The page then uses #external_value to display the 'bordered countries' and 'population' values; although it uses the #arraymap function, defined by the Page Forms extension, to apply some transformations to the 'bordered countries' value (you can ignore this detail if you want).

By default, #external_value displays an error message if it is called for a variable that has not been set, or if the specified data source is inaccessible, or the data source does not contain any data. You can disable the error message by adding the following to LocalSettings.php:

$edgExternalValueVerbose = false;

To prevent any further wiki processing of external data, for example, when it is SVG produced by get_program_data, you can use ‎<externalvalue data="external variable">Fallback placeholder‎</externalvalue>.

Displaying a table of values

The data returned by #get_web_data or #get_db_data (#get_ldap_data without the all parameter doesn't support this feature) can also be a "table" of data (many values per field), instead of just a single "row" (one value per field). In this case, you can display it using one of either the functions #for_external_table or #display_external_table.


This URL contains information similar to that above, but for a few countries instead of just one. Calling #get_web_data with this URL, with the same format as above, will set the local variables to contain arrays of data, rather than single values. You can then call #for_external_table, which has the following format:


...where "expression" is a string that contains one or more variable names, surrounded by triple brackets. This string is then displayed for each retrieved "row" of data.

For an example, this page contains a call to #get_web_data for the URL mentioned above, followed by this call:

{| class="wikitable"
! Name
! Borders
! Population
! Area {{#for_external_table:<nowiki/>
{{!}} {{{name}}}
{{!}} {{{borders}}}
{{!}} {{{population}}}
{{!}} {{{area}}}

The call to #for_external_table holds a single row of a table, in wiki-text; it's surrounded by wiki-text to create the top and bottom of the table. The presence of "{{!}}" is a standard MediaWiki trick to display pipes from within parser functions. There are much easier calls to #for_external_table that can be made, if you just want to display a line of text per data "row",[1] but an HTML table is the standard approach.

There's one other interesting feature of #for_external_table, which is that it lets you modify specific values. You can URL-encode values by calling them with {{{field-name.urlencode}}} instead of just {{{field-name}}}, and similarly you can HTML-encode values by calling them with {{{field-name.htmlencode}}}.

As an example of the former, if you wanted to show links to Google searches on a set of terms retrieved, you could call:

{{#for_external_table: http://www.google.com/search?q={{{term.urlencode}}} }}

This is required because standard parser functions can't be used within #for_external_table - so the following, for example, will not work:

{{#for_external_table: http://www.google.com/search?q={{urlencode:{{{term}}}}} }}


#display_external_table is similar in concept to #for_external_table, but it passes the values in each row to a template, which handles the display. This function is called as:

{{#display_external_table: template=template name |data=set of parameters, separated by commas |delimiter=delimiter |intro template=template name (optional) |outro template=template name (optional) }}

An explanation of the parameters:

  • template - the name of the template into which each "row" of data will be passed
  • data - the data mappings between external variable and local template parameter; much like the data parameters for the other functions
  • delimiter - the separator used between one template call and the next; default is a newline. (To include newlines in the delimiter value, use "\n".)
  • intro template - a template displayed before the results set, only if there are any results
  • outro template - a template displayed after the results set, only if there are any results

For example, to display the data from the previous example in a table as before, you could create a template called "Country info row", that had the parameters "Country name", "Countries bordered", "Population" and "Area", and then call the following:

  |format=CSV with header
  |data=Country name=Country,Countries bordered=borders,Population=population,Area=area
{| class="wikitable"
! Name
! Borders
! Population
! Area
{{#display_external_table:template=Country info row}}

The template "Country info row" should then contain wikitext like the following:

|{{{Country name}}}
|{{{Countries bordered}}}

Clearing data

You can also clear all external data that has already been retrieved, so that it doesn't conflict with calls to retrieve external data further down the page. The most likely case in which this is useful is when data is retrieved and displayed in a template that is called more than once on a page. To clear the data, just call "{{#clear_external_data:}}". Note that the ":" has to be there at the end of the call, or else MediaWiki will ignore the parser function.

There is no way to clear the values for only one field; #clear_external_data erases the entire set of data.

Storing data

You can also use External Data to store a table of data that has been retrieved; you can do this using the storage capabilities of either the Semantic MediaWiki or Cargo extensions. Once the data has been stored, it can then be queried, aggregated, displayed etc. on the wiki by that extension.

Semantic MediaWiki

If you store data with Semantic MediaWiki, you should note a common problem, which is that the data stored by SMW does not get automatically updated when the data coming from the external source changes. The best solution for this, assuming you expect the data to change over time, is to create a cron job to call the SMW maintenance script "rebuildData.php" at regular intervals, such as once a day; that way, the data is never more than a day old.

To store a table of data using SMW, you can use the #store_external_table function. This function works as a hybrid of the #for_external_table function and the #subobject function, defined in the Semantic MediaWiki extension.

  1. store_external_table loops over each row, and uses variables, in the same way as #for_external_table.

Unlike with #subobject, the first parameter is the name of a property that will link from the subobject to the page it's on. You can see a demonstration of this function on the page Fruits semantic data; the call to #store_external_table on that page looks like:

{{#store_external_table:Is fruit in
|Has name={{{name}}}
|Has color={{{color}}}
|Has shape={{{shape}}}


There is no special parser function for storing data via Cargo; instead you should simply use #display_external_table, and include Cargo storage code within the template called by that function. You can see an example of Cargo-based storage using #display_external_table here; it uses this template, and you can see the resulting data here.


Since version 2.2, External Data defines Lua functions that match the functionality of its six "accessor" parser functions, so that wikis that have the Scribunto extension installed can call these functions directly in order to access and display outside data.

The following functions are defined:

Lua function Corresponding parser function

The Lua functions accept the same parameters as parser functions, but please note the following:

  • Technically, there is only one parameter; it is known in Lua as a table, and its keys correspond to the parser function parameters.
  • Comma-separated lists like data can be replaced with Lua tables; so that both data = {internal1 = 'external1', internal2 = 'external2'} and data = 'internal1=external1,internal2=external2' will work.
  • "Valueless" parameters like use xpath can be supplied both as numbered and named: 'use xpath' and ['use xpath'] = true are both valid.
  • Parameters whose name contains a space, like json offset, need to be surrounded with quotes and brackets, like ['json offset'] =, unless they are valueless, in which case quotes are enough.

Each Lua function returns two values:

  1. A table of external data. Unlike with the parser functions, it will be "row-based", i.e. a numbered array of records with named fields corresponding to external variables. If external data is not fetched, nil will be returned.
    • If there is only one value for some external variable (it will be in the first record), it will be duplicated as a named field of the returned table, as it is highly likely that it belongs to the rowset as a whole rather than its first row; so that it can be accessed both as tbl[1].external1 and tbl.external1.
  2. A numbered table of error messages. If there were no errors, nil will be returned.

Unlike parser functions, external data is only returned to calling Lua module and not stored on the page to be retrieved later by {{#external_value:}}, etc.


Lua code Return
mw.ext.externaldata.getWebData {
    url = "https://discoursedb.org/wiki/Special:GetData/Fruits_data"
  , data = "name=Name,color=Color,shape=Shape"
  , format = "CSV with header"
mw.ext.externaldata.getWebData {
    url = 'https://discoursedb.org/wiki/Special:GetData/Fruits_data'
  , data = {name = 'Name', color = 'Color', shape = 'Shape'}
  , format = 'CSV with header'
table#1 {
 table#2 {
   ["color"] = "Red",
   ["name"] = "Apple",
   ["shape"] = "Round",
 table#3 {
   ["color"] = "Yellow",
   ["name"] = "Banana",
   ["shape"] = "Oblong",
 table#4 {
   ["color"] = "Orange",
   ["name"] = "Orange",
   ["shape"] = "Round",
 table#5 {
   ["color"] = "Yellow",
   ["name"] = "Pear",
   ["shape"] = "Pear-shaped",

}, nil

Common problems

  • If the call to #get_web_data or #for_external_table isn't returning any data, and the page being accessed is large, it could be because the call to retrieve is getting timed out. You should set the $wgHTTPTimeout flag in your LocalSettings.php file (which represents a number of seconds) to some number greater than 25, its default value. You could call, for instance:
$wgHTTPTimeout = 60;
  • If the data being accessed has changed, but the wiki page accessing it still shows the old data, it is because that page is being cached by MediaWiki. There are several solutions to this: if you are an administrator, you can hit the "refresh" tab above the page, which will purge the cache. You can also easily disable caching for the entire wiki; see here for how. Finally, if you wait long enough (typically no more than 24 hours), the page will get refreshed on its own and display the new data.
  • If you host a private wiki locally but use a dynamic IP service to access it, your wiki will connect to itself through your public IP and not through localhost or (or an IPv6 equivalent). In such a case, your wiki is not allowed to query itself so the examples given here will work when data are hosted on a different server but not if they are hosted on your wiki. A workaround is to use the extension Extension:NetworkAuth which allows you to automatically authenticate your router/box/modem to access your wiki. Note: the security of this approach is not guaranteed.
  • If the extension is not correctly handling non-ASCII characters, the problem might be that your PHP instance lacks the mbstring extension - make sure that it is installed.
  • To query data from another wiki that uses Semantic MediaWiki, it is recommended to use the Special:Ask page, rather than one of SMW's API actions, to construct the URL that will be passed in to #get_web_data, since the API will not output data in a syntax that External Data can use. To construct the URL, go to Special:Ask, create the desired query, then copy the URL from the "Download queried results in CSV format" link.

Version history

External Data is currently at version 2.4.1. See the entire version history.


Bugs and feature requests

The best place to report bugs is on Phabricator - see How to report a bug. The project that should be specified is MediaWiki-extensions-ExternalData.

You can also put any questions, suggestions or bug reports about External Data at the talk page for this extension. Or you can write to the MediaWiki mailing list, mediawiki-l. (If you write to the mailing list, please include "External Data" somewhere in the subject line.)

You can also send specific code patches to Yaron Koren, at yaron57@gmail.com.


Translation of External Data is done through translatewiki.net. The translation for this extension can be found here. To add language values or change existing ones, you should create an account on translatewiki.net, then request permission from the administrators to translate a certain language or languages on this page (this is a very simple process). Once you have permission for a given language, you can log in and add or edit whatever messages you want to in that language.


  1. For example: {{#for_external_table:*{{{name}}} has the following borders: {{{borders}}}.}}