Uikidata Tsa-sûn Ho̍k-bū/Luī-pia̍t

This page is a translated version of the page Wikidata Query Service/Categories and the translation is 18% complete.

Wikidata Query Service also provides access to category graph of all public wikis (except labswiki and labtestwiki).

Currently, the data is updated from the latest weekly dump. Updates happen each Monday.

Hóng-mn̄g sòo-kì

The data is stored in the Blazegraph database in categories namespace. Currently, there is no GUI to access the category data, but SPARQL queries can be made against the namespace by using https://query.wikidata.org/bigdata/namespace/categories/sparql?query=SPARQL. This SPARQL endpoint works in the same way as the main WDQS SPARQL endpoint.

Note that while each wiki has its own data set, they are all stored in the same namespace.

Example query, providing subcategories of category Ducks on English Wikipedia:

PREFIX gas: <http://www.bigdata.com/rdf/gas#>
prefix mediawiki: <https://www.mediawiki.org/ontology#> 

SELECT * WHERE {
SERVICE gas:service {
     gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.BFS" .
     gas:program gas:linkType mediawiki:isInCategory .
     gas:program gas:traversalDirection "Reverse" .
     gas:program gas:in <https://en.wikipedia.org/wiki/Category:Ducks>. # one or more times, specifies the initial frontier.
     gas:program gas:out ?out . # exactly once - will be bound to the visited vertices.
     gas:program gas:out1 ?depth . # exactly once - will be bound to the depth of the visited vertices.
     gas:program gas:maxIterations 8 . # optional limit on breadth first expansion.
  }
} ORDER BY ASC(?depth)

Try it!

This query would not work with default GUI! For now, you will have to run it manually against the SPARQL endpoint above. The dataset includes only categories and not pages belonging to categories (the latter would be much bigger data set).

Kan-tan tsa-tshuē

Simpler form of the query above can be accessed with mediawiki:categoryTree service:

SELECT ?out ?depth WHERE {
  SERVICE mediawiki:categoryTree {
    bd:serviceParam mediawiki:start <https://en.wikipedia.org/wiki/Category:Ducks> .
    bd:serviceParam mediawiki:direction "Reverse" .
    bd:serviceParam mediawiki:depth 5 .
  }
} ORDER BY ASC(?depth)

Try it!

run it manually

This query produces three output values:

  • ?outthe category found
  • ?depththe depth for the category
  • ?predecessorthe parent category

Tsu-liāu kik-sik

The data about category describe its URL and the name, e.g.

<https://test2.wikipedia.org/wiki/Category:Test> a mediawiki:Category ; 
    rdfs:label "Test" ;
    mediawiki:pages "74"^^xsd:integer ;
    mediawiki:subcategories "19"^^xsd:integer .

Links between categories are represented as mediawiki:isInCategory relationship, e.g.:

<https://test2.wikipedia.org/wiki/Category:Test> mediawiki:isInCategory <https://test2.wikipedia.org/wiki/Category:Parent>

Hidden categories have class mediawiki:HiddenCategory.

Tsiân-tsuè

Prefix mediawiki: is defined as https://www.mediawiki.org/ontology. Full ontology can be found at https://www.mediawiki.org/ontology/ontology.owl.

Tsán-tû piau-tê

Dump header contains information about the dump, e.g.:

<https://test2.wikipedia.org/categoriesDump> a schema:Dataset,
    owl:Ontology ;
    cc:license <https://creativecommons.org/licenses/by-sa/3.0/> ;
    schema:softwareVersion "1.0" ;
    schema:dateModified "2017-09-09T20:00:05Z"^^xsd:dateTime ;
    schema:isPartOf <https://test2.wikipedia.org/> ;
    owl:imports <https://www.mediawiki.org/ontology/ontology.owl> .


Data dumps

Data dumps are stored in https://dumps.wikimedia.org/other/categoriesrdf/. Full dumps are performed weekly. Each wiki has its own dump file.

https://dumps.wikimedia.org/other/categoriesrdf/lastdump/ stores timestamps of the last dump performed.

Updating

To update categories, the following can be used:

  1. Create categories namespace: bash createNamespace.sh categories
  2. Ka-tsài sòo-kì bash forAllCategoryWikis.sh loadCategoryDump.sh categories

Tsing-ka uikis

For now, if you want some wiki added, please comment on the talk page. Exception is Commons, which has by far the largest set of categories and thus we decided not to cover it for now, until we ensure everything works as planned with smaller data sets.

TODO