Wikidata/2005 proposal

Relaunch of the project is located at Meta:Wikidata.

See also: meta:Wikidata (2)

This proposal is outdated. For recent developments, see Wikidata. For previous efforts, see Wikidata/Archive.

Wikidata is a proposed wiki-like database for various types of content. This project as proposed here requires significant changes to the software (or possibly completely new software) but has the potential to centrally store and manage data from all Wikimedia projects, and to radically expand the range of content that can be built using wiki principles.

Imagine that you can edit the content of an infobox on Wikipedia (e.g. Germany) with one click, that you get an edit form specific to the infobox you are editing, and that other Wikipedias automatically and immediately use the same content (unless it is specific to your locale).

Imagine that some data in an article can be automatically updated in the background, without any work from you - whether it is the development of a company stock, or the number of lines of code in an open source project.

Imagine that you can easily search wiki-databases on a variety of subjects, without knowing anything about wikis.

This project is separate from the Wikimedia Commons, because a Wikidata database does not necessarily have to be useful for another Wikimedia project, and because it is larger in scope.

Applications

Astronomy - space.wikidata.org (spc.wikidata.org)

astronomical objects
constellations
craters
observatories and telescopes
surveys
space missions

Economy - economy.wikidata.org

Products
Corporations, companies
Governments and local administrative bodies, complete with analysis of statistical/aggregate parameters
Macroeconomic data (current and historical), indexes
Currency exchange rates
Oil prices & other commodities prices
Stock Exchanges indices
Interest rates

Events - time.wikidata.org

News
People biography
Timetables

Languages - language.wikidata.org

Translations tables (multilingual)
Dictionaries (multilingual)

Society - society.wikidata.org

Schools and universities
Cities, Countries, Subdivisions
Ethnic groups
Radio and television stations

Military - military.wikidata.org

Battles
Army divisions
Air Squadrons

Technology - tech.wikidata.org

Planes
Rockets
Ships
Weapons
Computer hardware

Nature - nature.wikidata.org

Plants
Animals
Species
Mountains
Rivers
Protected areas
Weather

Chemistry - chemistry.wikidata.org

Elements
Rocks and minerals, compounds

Content - works.wikidata.org

Books
Journal articles
Newspaper articles
Movies (IMDB is not open content)
Music

Locations - geo.wikidata.org

Cities
Countries
Regions
Geo-located pictures

See Wikimaps

Physics - physics.wikidata.org

Physical constants
Physics equations
Tables of wavefunctions etc.

Science (various) - science.wikidata.org

Pharmacology

Stamps, Coins and bank notes

Postage stamps
Coins
Bank notes

Calendar - calendar.wikidata.org

Events
Births
Deaths
Holidays and observances
- National
- Religious
- Laic

The project should allow for translations - this would then help to include all calender events of the different wikipedias and an easier update of the calendars by just needing to translate present events and only new events will then be added. In this way much time will be saved. For translation: OmegaT is a great instrument.

Requirements

Wikidata has the following technical requirements to be useful:

easy setup of data groups, and of new structures within a group
data structure editor
- tables
- fields
  - field types (text, number, textarea, localizable enumerations ..)
  - field constraints (required, unique etc.)
- relationships between fields (parents, brothers)
edit mechanisms
- modify more than one cell at once (e.g. search/replace)
- export of data in suitable formats (html, xml, csv)
- import from suitable formats
search mechanisms
- limit the table to the interesting subset
- use nested and/or/not requests
- take use of field types (date ranges, number ranges)
sort mechanisms
- by one or more fields, up or down
- take use of field types (numbers, user defined sort orders)
wiki-style syntax for describing view layouts and edit layouts
- placement of fields in a form
per-field difference engine to show changes to fields in a more precise manner
per-field history, recent changes etc.
transclusion of content from other Wikimedia projects
- default link destination, so that, for example, any link in an entry on movies points to Wikipedia
easy localization
- flag certain types of data as international (with possible auto-conversion routines) and not in need of localization
single login and Wikimedia Commons functionality should be in operation before this project goes live

Licensing considerations

Share-alike is not very fair when a much larger work includes a very small piece of data. Individual pieces of data are not copyrightable, claiming copyright on the database itself and the structure we create could help to boost such copyright claims by corporations (which in turn could harm Wikimedia), and could be difficult to enforce.

A very simple attribution license or the public domain may be a better option for data-projects.

Graphical mock-ups

m:Image:Wikidata-mockup.png This mock-up illustrates form-based editing. Note that we need easy ways to enter relations - in this illustration, the movie-actor relation must be parsed by the backend after saving. Autolinking means that on viewing, we get a link both to Wikipedia and to Wikidata itself for the autolinked word (e.g. a link to Wikipedia about the United States, and a link to Wikidata showing movies made in the United States).

For an idea of how we'd do this in Kendra Base see wikidata mockup in kendra base.

Implementation strategies within MediaWiki

Fixed set of tables

We distinguish between wiki-pages and data through the namespace. We can define certain namespaces to be pages, and other namespaces to be data. In the following examples, namespace 0 is for articles, and namespace 402 is for data on countries.

We presume that we have a revisions table that is both used for regular wiki-pages and pieces of data:

    revision_id   revision_comment    user_id    page_id
    ----------------------------------------------------
    2042          created monkey      52         300
    2043          added monkey info   203        300
    2044          created country     593        301
    ...

A pages table:

    page_id    page_name    page_namespace  top_revision
    ----------------------------------------------------
    300        Monkey       0               2043
    301        Germany      402             2044
    302        Poland       402             4893
    => an article on Monkeys, two sets of country data

A relations table:

    source_page_id   destination_page_id   relation_type
    ----------------------------------------------------
    301              302                   2
    => Germany is a neighbour of Poland

relation_types: 0=parent, 1=brothers/neighbours, 3=aunt ... whatever is useful

A data-longtext table:

    page_id  revision_id   name           value                   
    -------------------------------------------------------------------------------
    300      2042          article_text   A monkey is an animal...

A data-shorttext table:

    page_id   revision_id  name           value
    ---------------------------------------------------------------
    301       2044         country_flag   [[Image:Germany-flag.png]]

A data-numbers table:

    page_id  revision_id   name                      value
    ------------------------------------------------------
    301      2044          country_population     80000000
    301      2040          country_population     75000000

And so on, for the different types.

Now we can structure our data in arbitrary ways and do smart SELECTs:

    SELECT page_id,top_revision FROM pages WHERE page_name='Germany' AND page_namespace=402
    => 301, 2044
    SELECT data_numbers.value FROM data-numbers WHERE page_id=301 AND revision_id=2044
    => 80000000 - the country population

Dynamic table creation

We could create a sophisticated data manager application that allows the creation of tables without much technical know how. It could automatically manage revision storage and revision associations. Advantage: more efficient, constraints at database level. Disadvantage: less flexible, all code has to be aware of which tables exist.

m:Category:Proposed projects

Notes

Everything should be Wikidata. Liquid Threads comments, wiki pages, movies, everything. Abstract as much as possible.--Eloquence
data-workflow table to store workflow properties (publication status/date for Wikinews etc.)?--Eloquence
I think you want something like graph serialization, but with the concept of hyperlinks. That is, pointers to graph nodes specified in other files, or anywhere on the web, or whatever. Like nLSD. -- LionKimbro
It would be nice to manage bibliographic references as Wikidata. See also these proposals in the German Wikipedia. --Lambo 15:08, 20 Mar 2005 (UTC)

Related projects

The Semantic MediaWiki extension to MediaWiki extends wiki link syntax to represent two kinds of properties of articles: relations between articles and attribute values of articles. It supports inline query of these properties and export of them as RDF.
Kendra Initiative is developing a semantic data publishing/querying system called Kendra Base.
- Currently input is via 2 methods: wiki-style free text and also more structured forms input.
- Also reviewed at m:Kendra evaluation
w:TWiki is a wiki which features form-based input as well as metadata which adds a structure to the entered data.
jot.com seems to be doing something similar according to Jimbo, who has seen beta screenshots
w:Wikipedia:Proposal for intuitive table editor and namespace
I have yet to see the software to do this but I always thought that wikipedia would be an excellent project for a oodb. Instead of only allowing certain data have a generic object article. Then classify each article into a person or place. these would be subclasses of the article object and would have fixed fields (begin believed birthdate range, end believed birthdate range, bio, believed birthpacelocation lat/lon, etc....) This info could be persisted across all articles and make different languages simply different chunks of text on an object with a unique id. The reference potential would be drastically modified as you could definitively refer to a person or place regardless of language. It would also allow povs to be addedd as they could just be another block of text associated with the unique id(I like to call these lenses). Finally you could have the object inherit from actor with even more specified fields, or with multiple inheritance have the object inherit from actor and director. I've looked at oodb software and have found that the commercial ones allow multiple inheritance, though I would assume the performance would be terrible.