User:Brion VIBBER/2015 scratchpad

Sentence claimsEdit

Migrate from "foo bar <ref>baz</ref>" to wrapping the entire sentence in the reference claim?

Migrate claims from ad-hoc references to Wikidata claims?

Data-based text document modelEdit

instead of storing a giant text blob:

  '''Such-and-such''' is a [[thing]]<ref>foo</ref>.
  It is located in [[Anytown, USA]]<ref>bar</ref>.
  It is frequented by [[Famous Celebrity]]<ref>baz</ref>.
  It is 4 stories tall.<ref>quux</ref>

store a hierarchy:

  • [doc]
    • [para]
      • [sentence]
        • [text]: '''Such-and-such''' is a [[thing]].
        • [claim]: Q12345 P678
      • [sentence]
        • [text]: It is located in [[Anytown, USA]]
        • [claim]: Q12345 P789
      • [sentence]
        • [text]: It is frequented by [[Famous Celebrity]].
        • [claim]: Q12345 P890
      • [sentence]
        • [text]: It is 4 stories tall.
        • [claim]: Q12345 P901

Notes:

  • we could give persistent ids to individual sentences/claims and version them separately. Versioning history and authorship could be retained when moving sentences/claims around the article.
  • by mapping the entire sentence to a statement claim in Wikidata, we scope the reference and allow the reference data itself to sit elsewhere.
  • potentially we can attach mappings from claims to links or data as well?
  • aid translation duties by more closely tying together the claims that are identical (++ translation memory!)

Issues to consider:

  • aggregating thousands of individual sentences could be slow. consider performance.
  • how to re-connect history of sentences/claims that lose their claims from bad cut-n-paste?
  • how to make VE keep the data most of the time
  • if we connect sentence contents to the claim, how to maintain that and make it usable?
  • tools to take wikidata info and produce stub pages?