Extension:Cargo/Storing data
Die Erstellung von Datenstrukturen und die Speicherung von Daten erfolgt in Cargo ausschließlich über Vorlagen.
Jede Vorlage, die Cargo verwendet, muss Aufrufe an die Parserfunktionen #cargo_declare
und #cargo_store
enthalten; oder, seltener, Aufrufe an #cargo_attach
und #cargo_store
.
#cargo_declare
definiert die Felder für eine Datentabelle, #cargo_store
speichert Daten innerhalb dieser Tabelle, und #cargo_attach
gibt an, dass eine Vorlage ihre Daten in einer an anderer Stelle definierten Tabelle speichert.
Deklarieren einer Tabelle
Eine Vorlage, die Daten in einer Tabelle speichert, muss diese Tabelle ebenfalls entweder deklarieren oder sich selbst an eine Tabelle "anhängen", die an anderer Stelle deklariert ist.
Da es normalerweise eine Tabelle pro Vorlage gibt und umgekehrt, deklarieren die meisten Vorlagen, die Cargo verwenden, ihre eigene Tabelle.
Die Deklaration erfolgt über die Parserfunktion #cargo_declare
.
Diese Funktion wird mit der folgenden Syntax aufgerufen:
{{#cargo_declare: _table = table_name |field_1 = field description 1 |field_2 = field description 2 ...etc. }}
Weder der Tabellenname noch die Feldnamen dürfen Leerzeichen oder Bindestriche enthalten; stattdessen können Sie Unterstriche, CamelCase usw. verwenden. Unterstriche können weder am Anfang noch am Ende von Tabellen- und Feldnamen verwendet werden.
Die Feldbeschreibung muss mit dem Typ des Feldes beginnen, und in vielen Fällen wird es einfach der Typ sein. Die folgenden Typen sind in Cargo vordefiniert:
Datentyp | Beschreibung | Nicht indexiert? |
---|---|---|
Page
|
Enthält den Namen einer Seite im Wiki (Standardmaximalgröße: 300 Zeichen) | |
String
|
Enthält standardmäßigen nicht-Wikitext-Text (Standardmaximalgröße: 300 Zeichen) | |
Text
|
Enthält standardmäßigen nicht-Wikitext-Text; gedacht für längere Werte | ✔ |
Integer
|
Enthält eine Ganzzahl | |
Float
|
Enthält eine echte, d.h. Nicht-Ganzzahl, Nummer | |
Date
|
Enthält ein Datum ohne Zeit | |
Start date ,End date
|
Similar to Date , but are meant to hold the beginning and end of some duration. A table can hold either no Start date and no End date field, or exactly one of both. |
|
Datetime
|
Enthält ein Datum und Zeit | |
Start datetime ,End datetime
|
Work like Start date or End date , but include a time.
|
|
Boolean
|
Holds a Boolean value, whose value should be 1 or 0, or 'yes' or 'no' (see this section for Cargo-specific information on querying Boolean values) | |
Coordinates
|
Enthält geografische Koordinaten | |
Wikitext string
|
Holds a short text that is meant to be parsed by the MediaWiki parser (Standardmaximalgröße: 300 Zeichen) | |
Wikitext
|
Holds longer text that is meant to be parsed by the MediaWiki parser | ✔ |
Searchtext
|
Holds text that can be searched on, using the MATCHES command (benötigt MySQL 5.6+ oder MariaDB 5.6+) | |
File |
Holds the name of an uploaded file or image in the wiki (similar to Page, but does not require specifying the "File:" namespace) (Standardmaximalgröße: 300 Zeichen) | |
URL
|
Enthält eine URL (Standardmaximalgröße: 300 Zeichen) | |
Email
|
Enthält eine E-Mail-Adresse (Standardmaximalgröße: 300 Zeichen) | |
Rating
|
Holds a "rating" value, i.e., usually an integer from 1 to 5 |
Any other type specified will be treated as type "String". Fields of types that are unindexed are significantly slower to query on or join on.
A field can also hold a list of any such type. To define such a list, the type value needs to look like "List (delimiter) of type". For example, to have a field called "Authors" that holds a list of string values separated by commas, you would have the following parameter in the #cargo_declare call:
|Authors=List (,) of String
Caveat: fields set to List of type must have a different name than the table. If they're identical, cargo queries that include the "list" field return errors.
Feldparameter
The description string can also have additional parameters; these all are enclosed within parentheses after the type identifier, and separated by semicolons. Die aktuell erlaubten Parameter sind:
Parameter | Beschreibung |
---|---|
size=
|
For fields of type "Page", "String", "Wikitext string", "File", "URL" and "Email", sets the size of this field, i.e., the number of characters; the default is set by the global variable $wgCargoDefaultStringBytes, which in turn has a default value of 300 (although it can be modified in LocalSettings.php). |
hierarchy
|
Specifies that the field holds a hierarchy of values, as defined in the "allowed values" parameter (see next item). |
allowed values=
|
A set of allowed values that a field can have. (This is usually only done for fields of type "String" or "Page".) If "hierarchy" is not specified, this should simply be a set of comma-separated values. If "hierarchy" is specified, the values should be defined using the syntax of a bulleted list. In brief: every value should be on its own line, each line should start with at least one "*", the first line should start with exactly one "*", and the number of "*" should increase by no more than one at a time. |
- For example, to define a field called "Color" that has three allowed values, you could have the following declaration:
|Color=String (size=10;allowed values=Red,Blue,Yellow)
- Meanwhile, to define a field called "Main ingredient" that is a hierarchy, you could have the following declaration:
|Main_ingredient = String (hierarchy;allowed values=*Fruits **Mangoes **Apples *Vegetables **Root vegetables ***Carrots ***Turnips **Peppers)
Parameter | Beschreibung |
---|---|
link text=
|
For fields of type "URL", sets text that would be displayed as a link to that URL. By default, the entire URL is shown. |
hidden
|
Nimmt keinen Wert. If set, the field is not listed in Special:Drilldown, although it is still queryable. |
mandatory
|
Nimmt keinen Wert. If set, the field is declared as mandatory, i.e., blank values are not allowed. |
unique
|
Nimmt keinen Wert. If set, all values for the field must be unique – a value that already exists for that field in the table will not be saved. |
regex=
|
Sets a regular expression for this field, which all values must match. For example, if "regex=T\d+" is set, values for that field must consist of the letter "T" followed by one or more numerals. |
dependent on=
|
Takes in the name of another field in this table, to specify that this field should only be displayed in Special:Drilldown once the user has selected a value for that field. |
Andere #cargo_declare-Parameter
Other than the table's name and fields, the following parameters can also be added to #cargo_declare:
_parentTables
- for setting one or more other Cargo tables as the "parent tables" of this table. This is used within Special:Drilldown, to let the user filter on fields from additional Cargo tables that are tied in some way to this one. It takes the following syntax:
|_parentTables= tableName1(_localField=localFieldName, _remoteField=remoteFieldName, _alias=tableAlias); tableName2(...); ...
- Here, 'tableName1' is the name of the table you want to declare as the parent table. '_localField' and '_remoteField' specify the fields in the two tables that need to be joined on (the default values for both are "_pageName"). If '_alias' is defined, then that will be displayed in the drilldown instead of the parent table's name.
- Example: This drilldown display shows additional drilldown fields from a parent table, "Items" (listed as "Item) (template here)
_drilldownTabs
- for setting custom drilldown tabs in Special:Drilldown page. It can be declared like this:
|_drilldownTabs= Tab1(format=list;delimiter=\;;fields=A,B,C), Tab2(format=table; fields=A,C,D)
- where 'Tab1' is the display name of the tab, 'format' parameter takes the desired format name and thereafter, you can add all the parameters needed for that format and then 'fields' holds the set of fields to be displayed.
- Example: This drilldown display also shows custom tabs (template here)
#cargo_declare also displays a link to the Special:CargoTables page for viewing the contents of this database table.
Anfügen an eine Tabelle
In some cases, you may want more than one template to store their data to the same Cargo table.
In that case, only one of the templates should declare the table, while the others should simply "attach" themselves to that table, using the parser function #cargo_attach
.
This function is called with the following syntax:
{{#cargo_attach: _table = table_name }}
A template should have no more than one call to #cargo_attach (and no more than one call to #cargo_declare, for that matter). Any template that contains a call to #cargo_store should also call either #cargo_declare or #cargo_attach.
Speichern von Daten in einer Tabelle
A template that declares a table or attaches itself to one should also store data in that table.
This is done with the parser function #cargo_store
.
Unlike #cargo_declare
and #cargo_attach
, which apply to the template page itself and thus should go into the template's <noinclude> section, #cargo_store
applies to each page that calls that template, and thus should go into the template's <includeonly> section.
This function is called with the following syntax:
{{#cargo_store: _table = table_name |field_1 = value 1 |field_2 = value 2 ...etc. }}
The field names must match those in the #cargo_declare
call elsewhere in the template.
The values will usually, but not always, be template parameters; but in theory they could hold anything.
If $wgCargoStoreUseTemplateArgsFallback
is enabled (as it is by default), for fields whose value is one of your template's parameters (i.e. you are writing |paramName={{{paramName}}}
), and where the name of the template parameter is the same as the name of the Cargo field (apart from the presence of underscores instead of spaces), the field can be left out of the #cargo_store
call; so, in many cases, the call could instead simply look like:
{{#cargo_store: _table = table_name }}
In fact, not even the table name really needs to specified; so in many cases the call could even look like:
{{#cargo_store:}}
However, this is slightly less efficient (and maybe more confusing to readers) than specifying the table name.
Please note, $wgCargoStoreUseTemplateArgsFallback
is disabled on some wiki farms, including wiki.gg.
Speichern eines wiederkehrenden Ereignisses
Special handling exists for storing recurring events, which are events that happen regularly, like birthdays or weekly meetings.
For these, the parser function #recurring_event
exists.
It takes in a set of parameters for a recurring event (representing the start date, frequency etc.), and simply prints out a string holding a list of the dates for that event.
It is meant to be called within #cargo_store
(for a field defined as holding a list of dates), and #cargo_store
will then store the data appropriately.
#recurring_event
is called with the following syntax:
{{#recurring_event: start=start date |end=end date |unit=day, week, month, or year |period=some number, representing the number of "units" between event instances (default is 1) |include=list of dates, to be included in the list |exclude=list of dates to exclude |delimiter=delimiter for dates (default is ';') }}
Of these parameters, only "start=" and "unit=" are required.
By default, if no end date is set, or if the end date is too far in the future, #recurring_event
stores 50 instances of the event.
To change this, you can add a setting for $wgCargoRecurringEventMaxInstances
in LocalSettings.php, under the inclusion of Cargo.
For instance, to set the number to 100, you would add the following:
$wgCargoRecurringEventMaxInstances = 100;
If working with recurring events, declare the type of the field of the field to be List (;) of Date
.
Beispiel
Erstellen oder Neuerstellung einer Tabelle
No data is actually generated or modified when a template page containing a #cargo_declare call is saved. Instead, the data must be created or recreated in a separate process. Es gibt zwei Möglichkeiten, dies zu machen:
Webbasierte Registerkarte
From the template page, select the tab action called either "Create data" or "Recreate data". This will bring up an interface that may contain a checkbox reading "Recreate data into a replacement table, keeping the old one for querying". (That checkbox will only appear if the Cargo table in question already exists.)
Once you hit "OK", one of the following will happen:
- If the checkbox was selected, a "replacement table" will be created, while the current table remains unaffected. This replacement table can be viewed by anyone, but its data will not be used in queries. (In the database, the actual table will have a name like "cargo__tableName__NEXT".) If/when you think this replacement table is ready to be used, you can click on the "Switch in table" link at Special:CargoTables. This link will delete the current Cargo table and rename the replacement table so that it becomes the official table. Conversely, if you would rather not use the replacement table, you can click on the "Delete" link for it.
- If the checkbox was not selected, the current table will be deleted immediately, and a new version will get created.
- If the checkbox was not there, it means that this is a new table. In that case, the table will be created.
In all three cases, MediaWiki jobs are used to cycle through all the relevant pages and recreate the data - a separate job is created for each page. This can be a lengthy process for large tables, which is why using the "replacement table" approach is recommended for large tables - it avoids a "downtime" period when the table is mostly empty.
Depending on the MediaWiki configuration, a call to MediaWiki's runJobs.php script may be useful or even necessary for these jobs to actually start.
If any templates contain #cargo_attach, they too will get a "Create data" or "Recreate data" tab. If this tab is selected and activated, it will not drop and recreate the database table itself; instead, it will only recreate those rows in the table that came from pages that call that template.
Berechtigungen
The ability to create/recreate data is available to users with the 'recreatecargodata' permission, which by default is given to sysops. You can give this permission to other users; for instance, to have a new user group, 'cargoadmin', with this ability, you would just need to add the following to LocalSettings.php:
$wgGroupPermissions['cargoadmin']['recreatecargodata'] = true;
Once a table exists for a template, any page that contains one or more calls to that template will have its data in that table refreshed whenever it is resaved, and new pages that contain call(s) to that template will get their data added in when the pages are created.
Kommandozeile
If you have access to the command line, you can also recreate the data by calling the script cargoRecreateData.php, located in Cargo's /maintenance
directory.
Es kann auf einen von zwei Wegen aufgerufen werden:
Befehl | Beschreibung |
---|---|
php cargoRecreateData.php |
Recreates the data for all Cargo tables in the system |
php cargoRecreateData.php --table tableName |
Recreates the data for the one specified Cargo table. |
In addition, the script can be called with the --quiet
flag, which turns off all printouts.
For full usage information, call it with --help
.
Speichern von Seitendaten
You can create an additional Cargo table that holds "page data": data specific to each page in the wiki, not related to infobox data. This data can then be queried either on its own or joined with one or more "regular" Cargo tables. The table is named "_pageData", and it holds one row for every page in the wiki. You must specify the set of fields you want the table to store; by default it will only hold the five standard Cargo fields (_pageName, _pageTitle, _pageNamespace, _pageID and _ID: see Database storage details). To include additional fields, add to the array $wgCargoPageDataColumns in LocalSettings.php, below the line that installs Cargo.
Below are the fields that can be added to the _pageData table, along with the call to add each one:
Feld | Beschreibung | LocalSettings.php call |
---|---|---|
_creationDate
|
Das Datum/Die Zeit, wo die Seite erstellt wurde | $wgCargoPageDataColumns[] = 'creationDate';
|
_modificationDate
|
Das Datum/Die Zeit, wo die Seite modifiziert wurde | $wgCargoPageDataColumns[] = 'modificationDate';
|
_creator
|
Der Benutzername des Benutzers, der die Seite erstellt hat | $wgCargoPageDataColumns[] = 'creator';
|
_lastEditor
|
The username of the user who last modified the page | $wgCargoPageDataColumns[] = 'lastEditor';
|
_fullText
|
The (searchable) full text of the page | $wgCargoPageDataColumns[] = 'fullText';
|
_categories
|
The categories of the page (a list, queryable using "HOLDS"). Note that spaces are stored as underscores. | $wgCargoPageDataColumns[] = 'categories';
|
_numRevisions
|
Die Anzahl an Bearbeitungen, die die Seite jemals hatte | $wgCargoPageDataColumns[] = 'numRevisions';
|
_isRedirect
|
Ob diese Seite eine Weiterleitung ist | $wgCargoPageDataColumns[] = 'isRedirect';
|
_pageNameOrRedirect
|
The target of the page if it's a redirect, otherwise the page name | $wgCargoPageDataColumns[] = 'pageNameOrRedirect';
|
_pageIDOrRedirect
|
The ID of the target of the page if it's a redirect, otherwise the page ID | $wgCargoPageDataColumns[] = 'pageIDOrRedirect';
|
Once you have specified which fields you want the table to hold, go to the Cargo /maintenance directory, and make the following call to create, or recreate, the _pageData table:
php setCargoPageData.php
To recreate with replacement, add a --replacement
flag:
php setCargoPageData.php --replacement
The replacement table can then be switched in normally using the Special:CargoTables
interface.
If you want to get rid of this table, call the following instead:
php setCargoPageData.php --delete
You do not need to call the "--delete" option if you are planning to recreate the table; simply calling setCargoPageData.php will delete the previous version.
Speichern von Dateidaten
Similarly to page data, you can also automatically store data for each uploaded file. This data gets put in a table called "_fileData", which holds one row for each file. This table again has its own settings array, to specify which columns should be stored, called $wgCargoFileDataColumns.
There are currently five columns that can be set:
Feld | Beschreibung | LocalSettings.php call |
---|---|---|
_mediaType |
The media type, or MIME type, of each file, like "image/png" | $wgCargoFileDataColumns[] = 'mediaType';
|
_path
|
The directory path of the file on the wiki's server | $wgCargoFileDataColumns[] = 'path';
|
_lastUploadDate
|
The date/time at which the file was last uploaded | $wgCargoFileDataColumns[] = 'lastUploadDate';
|
_fullText
|
The full text of the file; this is only stored for PDF files | $wgCargoFileDataColumns[] = 'fullText';
|
_numPages
|
The number of pages in the file; this is only stored for PDF files | $wgCargoFileDataColumns[] = 'numPages';
|
To store the full text of PDF files, you need to have the pdftotext
utility installed on the server, and then add the following to LocalSettings.php:
$wgCargoPDFToText = '...path to file.../pdftotext';
pdftotext is available as part of several packages. if you have the PdfHandler extension installed (and working), you may have pdftotext installed already.
To store the number of pages, you need to have the pdfinfo
utility installed on the server, and then add the following to LocalSettings.php:
$wgCargoPDFInfo = '...path to file.../pdfinfo';
Once you have specified which fields you want the table to hold, go to the Cargo /maintenance directory, and make the following call to create, or recreate, the _fileData table:
php setCargoFileData.php
Datenbank-Speicherdetails
When the data for a template is created or recreated, a database table is created in the Cargo database that (usually) has one column for each specified field. This table will additionally hold the following columns:
Feld | Beschreibung |
---|---|
_pageName
|
Holds the name of the page from which this row of values was stored. |
_pageTitle
|
Similar to _pageName , but omits the namespace, if there is one.
|
_pageNamespace
|
Holds the numerical ID of the namespace of the page from which this row of values was stored. |
_pageID
|
Holds the internal MediaWiki ID for that page. |
_ID
|
Holds a unique ID for this row. |
Speicherung von Listen=
For fields that have lists of values, the handling is more complex: a whole separate database table is created to hold all the individual values for this field. This table will get the name "MainTableName__FieldName" (e.g., "Books__Authors"), and it will have the following fields:
Feld | Beschreibung |
---|---|
_rowID |
Holds the ID of the row (i.e., _ID) in the main table that this value corresponds to. |
_value
|
Holds the actual, individual value. |
_position
|
Holds the position of this value in the list (can be 1, 2, etc.) |
So if an "Authors" field contained three values, the "Books__Authors" table would have three rows corresponding to that one page.
There's one more complication for list fields: the corresponding field for a list field in the database table will not actually be given that name, but will rather be called "FieldName__full", e.g., "Authors__full". This is to enable the "true" field name to serve as a "virtual" field within the #cargo_query call, to make querying on the field values table easier (see 'The "HOLDS" command').
Speicherung von Hierarchien=
For fields that have a set of allowed values that is defined as being a hierarchy, a separate database table is created to store the whole set of allowed values. This table will get the name "MainTableName__FieldName__hierarchy" (e.g. "Books__Genre__hierarchy"), and it will have the following fields:
Feld | Beschreibung |
---|---|
_value
|
Der erlaubte Wert. |
_left
|
The number of the leftmost node represented by this value. |
_right
|
The number of the rightmost node represented by this value. |
For an explanation of this method of storage, see the Wikipedia article "Nested set model".
Speicherung von Dateinamen
If a table has one or more fields of type "File", an additional table is created - for use in searching on files within Special:Drilldown - with the name "MainTableName__files" (e.g., "Books__files"), with the following fields:
Feld | Beschreibung |
---|---|
_pageName
|
The name of the page from which this row of values was stored. |
_pageID
|
Die interne MediaWiki-ID für die Seite. |
_fieldName
|
The name of the relevant field of type "File". |
_fileName
|
The value of the field, i.e., the name of an uploaded file. |
Speicherung von Koordinaten
For fields of type 'Coordinates', like for fields that hold a list of values, no database field is created with the actual specified field name. Instead, the following three fields are created:
Feld | Beschreibung |
---|---|
fieldName__full | Enthält die Koordinaten, wie sie in der Seite geschrieben sind |
fieldName__lat | Enthält die geografische Breite von den Koordinaten als eine Gleitkommazahl |
fieldName__lon | Enthält die geografische Länge von den Koordinaten als eine Gleitkommazahl |
If the coordinates cannot be parsed, the "__full" field still gets the value, but the "__lat" and "__lon" fields are set to null.
Speicherung von Daten=
For fields of type 'Date' or 'Datetime', an extra field is created that is named "fieldName__precision". It holds an integer value representing the "precision" of each date value, i.e., whether it holds a full date, only a year, etc. The possible values are:
Value | Description |
---|---|
0 | Date and time (can only occur for 'Datetime' fields) |
1 | Date only |
2 | Year and month only |
3 | Year only |
Storing Flex Diagrams data
The Flex Diagrams extension lets users define (and display) diagrams within wiki pages. If both Cargo and Flex Diagrams are installed on a wiki, you can store some data from those diagrams within special Cargo tables so that the data can be browsed and queried. Two diagram types can have their data stored: BPMN diagrams and Gantt charts. Unlike with Cargo's standard special tables, _pageData and _fileData, you cannot dictate which columns get set - they all are.
BPMN diagrams
Data about BPMN diagrams is stored in the table _bpmnData. This table can be generated by calling the following in the Cargo /maintenance directory:
php setCargoBPMNData.php
This table holds the following columns:
Feld | Beschreibung |
---|---|
_BPMNID
|
The internal ID of a component |
_name
|
The external name assigned to the component |
_type |
The type of the component; one of 'task', 'exclusiveGateway', 'sequenceFlow', or 'startEvent' |
_connectsTo
|
The IDs of the components this component connects to |
_annotation
|
The annotation of this component. |
Gantt charts
Data about Gantt charts is stored in the table _ganttData. This table can be generated by calling the following in the Cargo /maintenance directory:
php setCargoGanttData.php
This table holds the following columns:
Feld | Beschreibung |
---|---|
_localID
|
The internal ID of a task |
_name
|
The name of the task |
_startDate
|
The start date of the task |
_endDate
|
The end date of the task |
_progress
|
A decimal value (between 0 and 1) representing the progress of the task |
_parent
|
The ID of the parent task, if any |
_linksToBB
|
A list of IDs of tasks whose beginnings are connected to this task's beginning |
_linksToBF
|
A list of IDs of tasks whose ends are connected to this task's beginning |
_linksToFB
|
A list of IDs of tasks whose beginnings are connected to this task's end |
_linksToFF
|
A list of IDs of tasks whose ends are connected to this task's end |