Wikibase/Introduction to modeling data
Overview
This guide is for you if:
- You have an empty Wikibase before you
- You have data you need to model using your Wikibase
Model your own data
There's one thing this guide can't do for you: it can't help you model your data.
- You and the other people who work with this data are the only ones with the information necessary to decide how best to model this data.
- There's no one right answer to how to model your data. Modeling is a series of choices about how to organize your data, and those choices look different based on your data, which may change while you work with it. Those choices may also look different when you're halfway through the process.
- Data models sometimes need to change. This can be a painful process if the new model entails recreating a significant portion of your data, yet some users have found this unavoidable as their data model evolved. Thus, it's advisable to take as much time as you need to create your data model before even logging into your Wikibase instance.
What this guide can do is point at some of the decisions you will need to make and offer you a starting point.
Concepts
In Wikibase, you'll need to think of your data in terms of the concepts Wikibase uses to store data: items , properties , statements , and so forth.
In the above example you see an item page. It contains a statement: that Jimmy Wales (item ) is an instance of (property ) a human (item ).
It may be fairly straightforward to think of how data might be modeled as items, but the moment you start to consider modeling properties, you face some crucial decisions.
If you've never modeled data before or if you just don't know where to start, dip your toe in by looking at a robust, established Wikibase instance -- for example, Wikidata. Seeing how someone else modeled their data will help you model yours, if only by showing you a way you can immediately see is unsuitable for your data.
For more advanced concepts in data modeling, this primer on Wikibase data modeling is a must-read.
More properties, or more items?
If your intention were, for example, to model familial relationships, creating a property called "parent of" that expects an item as its data type is a perfectly fine decision. You will then also need properties like "child of", "sibling of", "cousin of", and so forth, yielding a model that has many properties and fewer items.
But you might also choose to create a single property: "has relationship". Then you would need to create one item for each type of relationship ("father", "sister") and have those constitute the list of valid values for that property ("has relationship: father"), and then conceive of a way to relate the entire statement to another item.
Adding more properties to your data model will lead to fewer items in your Wikibase and compel a certain way of thinking about every future item and statement you plan; adding fewer properties will lead to more items and an entirely different way of thinking.
Interoperability
Do you plan someday to have your data relate with (or map onto) the data in Wikidata or another Wikibase instance? Consider starting your Wikibase by creating some of the same properties that are in Wikidata. That way you know you will be able to map statements with those properties to Wikidata in the future.
To get a better sense of how interoperable we hope to make the entire Linked Open Data web, including your Wikibase if possible, peruse Wikimedia Deutschland's strategy documents.
Resources
One excellent way to learn about modeling data is to see how others did it. We’ve supplied a few good examples as well as some reference materials for your edification.
- Andra Waagmeester’s presentation to the Wikibase User Group: https://docs.google.com/presentation/d/14jyMTRkuZwQ9RQ5pLCrxqedrZYCBSYZOeApk_wr52-M
- Andra Waagmeester's example taxonomy: https://trex-taxonomy.wikibase.cloud/wiki/Main_Page
- A presentation on data modeling from Michelle Pfeiffer and Jose Emilio Labra Gayo: https://www.youtube.com/watch?v=MDjyiYrOWJQ
- Enslaved.org project papers: