Help:Tabular Data
Note: When you edit this page, you agree to release your contribution under the CC0. See Public Domain Help Pages for more info. |
Tabular data allows users to create CSV-like tables of data, and use them from other wikis to create automatic tables, lists, and graphs.
To create a new table, go to Wikimedia Commons, and create a new page in the Data namespace with a .tab suffix, such as Data:Sandbox/Name/Example.tab
.
Feel free to experiment by creating pages with the Sandbox/<username>/ prefix.
For now, page content can only be edited in the raw JSON format and with a basic table editor, but a gadget allows import/export from/to CSV and Excel files.
Eventually (written 21 November 2016), we hope there will be a spreadsheet-like editor to simplify data editing.
The underlying data format largely matches the Frictionless Data standard for a tabular data resource.
Data licensing
All data in the Data:
namespace must explicitly indicate the license of the data.
The recommended license is Public Domain, licensed under Creative Commons Zero (CC0) license.
To indicate this, every data page must have "license": "CC0-1.0"
, which means the data can be used under the CC0 version 1.0, or (at your option) any later version.
By editing the data, you agree to the Terms of Use, and you irrevocably agree to release your contribution to the public domain under CC0.
If it isn't possible to release the data under CC0, the following licenses are also supported:
- CC-BY:
- CC-BY-1.0: Creative Commons Attribution 1.0
- CC-BY-2.0: Creative Commons Attribution 2.0
- CC-BY-2.5: Creative Commons Attribution 2.5
- CC-BY-3.0: Creative Commons Attribution 3.0
- CC-BY-4.0: Creative Commons Attribution 4.0
- CC-BY-4.0+: Creative Commons Attribution 4.0 or later version
- CC-BY-SA
- CC-BY-SA-1.0: Creative Commons Attribution-Share Alike 1.0
- CC-BY-SA-2.0: Creative Commons Attribution-Share Alike 2.0
- CC-BY-SA-2.5: Creative Commons Attribution-Share Alike 2.5
- CC-BY-SA-3.0: Creative Commons Attribution-Share Alike 3.0
- CC-BY-SA-4.0: Creative Commons Attribution-Share Alike 4.0
- CC-BY-SA-4.0+: Creative Commons Attribution-Share Alike 4.0 or later version
- ODbL-1.0
- ODbL-1.0: ODC Open Database License v1.0
Any templates that pull data from non-CC0 licensed datasets need to comply with the relevant attribution terms, hence it is highly encouraged to prefer CC0 whenever possible.
At some future time, the list of licenses supported by the Data namespace may be expanded.
Data types
Tabular JSON data supports several basic value types.
You may also use null
instead of the value to mark it as missing.
number
— A numeric value with an optional fractional part and may use exponential E notation, but cannot include non-numbers like NaN.boolean
— only allowstrue
andfalse
values.string
— a text string no longer than 400 characters long. Special characters like new lines\n
and tabs\t
are not allowed.localized
— A multilingual string, represented as an object with keys being language codes (e.g. "en"), and values being a string with above limitations of 400 max chars. For example,{"en":"string in English", "fr":"chaîne de texte en français", ...}
has string values for both English and French.
Top-level fields
Tabular data has several required and optional top-level elements:
license
— required field, must always be set to"CC0-1.0"
string value. For now, tabular data only supports CC0 (Public Domain dedication) license version 1.0 or later. More license support may be available in the future.schema: {"fields": [{...}, {...}, ...]}
— required field, must set to an object that contains a list. Each field describes a column of the tabular data. Each field must be an object with mandatory"name"
and"type"
values.name
— required field, is the name of the column. The value must begin with a letter or an underscore"_"
, and must only contain letters, underscores, or digits. This is done so that each header can be easily used from a coding environment like Lua or Vega graphs.type
— required field, must be set to one of these values:"number"
,"boolean"
,"string"
, or"localized"
.title
— optional field, is an optional translation for the column's header. If set, it must be a localized string object.
data
— required field, must always be set to a list of lists. Each sub-list must have the same number of elements as headers, and must match the value types.description
— optional field, must be set to a localized string value - an object with at least one key-value, where the key is a language code (e.g. "en"), and the value is a description string.sources
— optional field, must be a Wiki markup string value that describes the source of the data.
Usage
There are two ways to use this data:
- A Lua script on any wiki can get this data by calling
mw.ext.data.get("Example.tab")
. The function returns tabular data in almost the same format as the original JSON, except that all localized strings will be converted to regular strings, and the license field will also include a localized license name. To get the data in another language, pass language code as the second parameter. To get the data in the original, unmodified form, use "_" as the language code. - A Vega graph can get tabular data by using
"tabular:///Example.tab"
as the data source url.
On Commons, transcluding a page from the Data namespace, i.e. {{Data:Example.tab}}
, will render it as an HTML table.
To access data directly on a wiki page, you can import (if you don't already have them) the tabular data module (requires the navbar module) and optionally the tabular query template (requires the before mentioned tabular data module).
With these tools you can easily get the value of a single cell.
Example
{
"license": "CC0-1.0",
"description": {
"en": "Some good fruits for you",
"es": "Algunas buenas frutas para ti"
},
"sources": "http://example.com and [[Data]] page",
"schema": {
"fields": [
{ "name": "id", "type": "string", "title": { "en": "Fruit ID", "fr": "ID de fruit" }},
{ "name": "count", "type": "number", "title": { "en": "Count", "fr": "Décompte" }},
{ "name": "liked", "type": "boolean", "title": { "en": "Do I like it?", "fr": "L’aimes-tu ?" }},
{ "name": "description", "type": "localized", "title": { "en": "Fruit name", "fr": "Nom du fruit" }}
]
},
"data": [
[
"peaches",
100,
true,
{
"en": "Magnificent but a bit artificial sweet peaches",
"es": "esto puede estar en español",
"fr": "Magnifiques mais ce sont des pêches un tantinet sucrées"
}
],
⋮
]
}
See c:Data:COVID-19 cases in Santa Clara County, California.tab for an example of how the JSON data renders on Commons.
Restrictions and gotchas
- Each string value except the
"sources"
must be no more than 400 symbols long. Special characters like new lines\n
and tabs\t
are not allowed. - The overall size of the page may not exceed 2MB.
Admin notes
- Site administrators may customize error messages, such as
jsonconfig-err-license
to make sure editors know about the CC0-only limitation.
Converting data to JSON
There are several tools available for converting other formats to JSON
CSV and TSV
- convertcsv.com — (select "CSV to JSON Array") Generate output as JSON Array, under 'output options' select option 'if to JSON Array, create array for column names with name'. You will still need to manually add name: and type: to each entry in fields (see example above for formatting)
XLS
- xls-to-json — Convert xls files to JSON in NodeJS
See also
- Help:Map Data — similarly structured map data in the Data: namespace on commons.
- Lua data modules
- Tabular data — some implementation details
- Category:Graph Template Collection — templates to do graphs with the Commons Datasets
- DataNamespace — previous proposal
- TheDJ/tabularImportExport.js — A script to import from and export to CSV and Excel files, by User:TheDJ.
- phab:T154071 — Allow non-CC0 licensed data for datasets