Wikidata Query Service/Blank Node Skolemization
For information about skolemization in the RDF context please read RDF 1.1 Concepts - 3.5 Replacing Blank Nodes with IRIs.
Why skolemizing the blank nodes?
editAs part of the work to improve the performance of the Wikidata Query Service update process we decided to go with a patch approach. In the same vein as what was proposed in rdf-patch or TurtlePatch the idea is to mutate the graph with a set of trivial INSERT DATA and DELETE DATA operations. This is where blank nodes can't be used within these operations because they are by nature unidentifiable. By skolemizing the blank nodes we give an identity to the blank nodes and allow to apply such mutations on any triple store.
How does this affect my SPARQL query?
editQueries using isBlank()
editQueries using isBlank(?o) will stop functioning and have to be rewritten using the wikibase:isSomeValue(?o)
function.
SELECT ?human WHERE {
?human wdt:P21 ?gender .
FILTER isBlank(?gender)
}
Must be rewritten with:
SELECT ?human WHERE {
?human wdt:P21 ?gender .
FILTER wikibase:isSomeValue(?gender)
}
wikibase:isSomeValue
is already usable on WDQS and will work even if blank nodes have not yet been skolemized on this service.Queries using isIRI()
editThe skolem form being an IRI the use of isIRI()
might conflate SomeValue nodes. To eliminate possible ambiguities !wikibase:isSomeValue(?o)
can be used:
select ?entity ?id {
?entity wdt:P2520 ?id .
FILTER isIRI(?id)
} LIMIT 10
can be rewritten as:
select ?entity ?id {
?entity wdt:P2520 ?id .
FILTER(isIRI(?id) && !wikibase:isSomeValue(?id))
} LIMIT 10
Form of the skolem IRI in results
editThe form of the IRI will be compliant with the RDF recommendations for example:
http://www.wikidata.org/.well-known/genid/a8d14fa93486370345412093add8f50c
These IRIs will now replace the t9283749 in the result sets.
SELECT ?human ?someValue WHERE {
?human wdt:P21 ?someValue .
FILTER wikibase:isSomeValue(?someValue)
} LIMIT 2
instead of returning:
human | someValue |
---|---|
wd:Q10613691 |
t38348832
|
wd:Q15626781 |
t38348832
|
will return:
human | someValue |
---|---|
wd:Q10613691 |
http://www.wikidata.org/.well-known/genid/85cdf09ea8537248cb28182c131b623f
|
wd:Q15626781 |
http://www.wikidata.org/.well-known/genid/2a4afecd4ba3d3bbeb35e32a19fd179d
|
Changes to the RDF model (RDF dumps and Special:EntityData)
editIn order to limit the differences between what is served by Wikidata Query Service and the RDF representation of wikidata entities the RDF model used in the RDF dumps and Special:EntityData may change to include skolem IRIs instead of labelled blank nodes.
For example a statement including a SomeValue snak will be changed from:
wd:Q3 a wikibase:Item, wdt:P2 _:e39d2a834262fbd171919ab2c038c9fb .
wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c a wikibase:Statement ;
ps:P2 _:fd30b9e2840921156210596f03414b05 ;
wikibase:rank wikibase:NormalRank .
to
wd:Q3 a wikibase:Item, wdt:P2 <http://www.wikidata.org/.well-known/genid/e39d2a834262fbd171919ab2c038c9fb> .
wds:Q3-45abf5ca-4ebf-eb52-ca26-811152eb067c a wikibase:Statement ;
ps:P2 <http://www.wikidata.org/.well-known/genid/fd30b9e2840921156210596f03414b05> ;
wikibase:rank wikibase:NormalRank .
The skolemization function is trivial as it reuses the blank node label for the skolem IRI suffix. Note that blank node labels as generated by wikibase now allow to retain the identity of the blank node.
For consumers willing to stick to blank nodes semantic the function to generalize the skolemized graph is also trivial as all well known IRIs prefixed with http://www.wikidata.org/.well-known/genid/
can be transformed back to blank nodes labelled with the suffix of the skolem IRIs.
In other words for:
- G: a Wikidata graph or subgraph containing properly labelled blank nodes
- sk the skolemization function described above
- unsk the function described above that transforms skolem IRIs back to blank nodes
It is guaranteed that G = unsk(sk(G)).
When this breaking change is applied (following proper announcement made to the wikidata mailing lists) RDF dumps and Special:Entity will start to emit G′ where G′ = sk(G).
The specification of the RDF model will be changed accordingly.