Topic on Extension talk:WikibaseLexeme/RDF mapping

Smalyshev (WMF) (talkcontribs)
  1. Not sure I see the point for having wikibase:lemma, since rdfs:label would do exactly the same thing.
  2. I would still make schema:inLanguage with language code, even if that is derived data and not primary. It may simplify querying a lot. And we already have config for this anyway. Of course if the language has no ISO code that one would be empty.
  3. dct:language has a range of dcterms:LinguisticSystem. We don't have this class on our language items, so it might be wrong.
  4. From the two above, I think we need to use schema:inLanguage and wikibase:language, unless we find a good language predicate with unresricted range.
  5. Maybe we change wikibase:grammaticalFeature to wikibase:partOfSpeech? To keep it similar to lexinfo.
  6. Not super-happy about having both ontolex:representation and rdfs:label but I see how it could be useful
  7. wikibase:grammaticalFeature sounds fine
  8. skos:definition seems to be closer to description than to label... But depends on usage.
Tpt (talkcontribs)

Thank you very much for your feedbacks. Some comments:

  1. wikibase:lemma has the advantage of being specific to lemmas and so allows to do queries like "get all lemmas with the label "foo"@en without having to do a filter on entity types. I would keep wikibase:lemma in the Query Service and filter out rdfs:label.
  2. Big +1 to it. I'm adding it to the document as derived data.
  3. I don't think it's a big problem. The triple dct:language rdfs:range dcterms:LinguisticSystem is meaning that the RDFS entailment on our data is deriving that all items used as a language for Lexemes are also dcterms:LinguisticSystem. It does not look wrong to me. We already have such behaviours with, e.g., the use of cc:license that have for rdfs:domain cc:Work. If we want to be safe and avoid to use this term we should probably also avoid to reuse ontolex: terms that come with a quite expressive OWL ontology: http://www.w3.org/ns/lemon/ontolex
  4. See 3.
  5. If we use wikibase:partOfSpeech we move a bit out of the wording used by the abstract data model and the JSON specification. And it seems to me that "grammatical feature" is a bit broader than "part of speech" But I am not very familiar with computational linguistic so I may be wrong.
  6. I believe we should just drop rdfs:label from the SPARQL endpoint.
  7. Thanks.
  8. It is the property that is suggested by the ontolex: specification to encode glosses. It is presented in the SKOS spec primer as skos:definition supplies a complete explanation of the intended meaning of a concept.
Denny (talkcontribs)

+1 on 3. and 4. as described by Tpt.

Smalyshev (WMF) (talkcontribs)

If range is not an issue for dct:language then maybe we should use it. Having extra prefix is not a huge deal - as soon as we add one, we can add more. I don't think it is an issue then.

ArthurPSmith (talkcontribs)

Just a note, all the concerns I had were already expressed here by Smalyshev - I also prefer just using rdfs:label but from the discussion there seem to be other people who don't like it; not sure we can fully resolve it, but count me as another vote to at least keep rdfs:label as noted for the Lexemes and Forms (you could drop the Senses case though as it's not really a label). On the language labeling, Dublin Core has a very common problem of inconsistent usage but I suppose if Wikidata is using it consistently it's not a problem here. Count me as another vote for a wikibase:language predicate though if you're thinking about it. Otherwise, congratulations on this proposal, it seems pretty simple and well thought-out!

Reply to "My notes"