Talk:Structured Data Across Wikimedia

About this board

Use Multi-Content-Revisions to replace the sitelink system on Wikidata

Lectrician1 (talkcontribs)

I recently developed the idea of creating dedicated Wikidata items for Wikipedia articles.

This stems from currently unfixable Wikidata item conflation problems created by Wikipedia articles across languages describing different things and then being sitelinked together on one item.

I described and shared this solution and problem with the Wikidata and Wikipedia communities.

However, as part of the discussion on Wikipedia, I came to the conclusion that using Multi-Content-Revisions like this project plans to do might be better solution than creating items for every one of the 57 million+ individual Wikipedia articles. Just as Commons stores it's own structured data, Wikipedia should probably as well. Wikidata should only act as a repository that stores data about the general universe, and not anything specific to Wikimedia.

How this would work is Wikipedia editors using the proposed Wikipedia article metadata editor would add a main subject (P921) statement to describe the overall subject of the Wikipedia article. Then, the article would sitelink to other language Wikipedia articles by finding articles that also have the same main subject (P921) statement.

Let me know your thoughts!

Alsee (talkcontribs)

I agree, although it's painful.

We have a big problem with the interwiki link system, and Wikidata is unwilling or unable to fix it. Wikidata is designed on a premise of a unique 1-to-1 mapping of concepts. At best that is simply wrong, and at worst it is effectively culturally imperialistic. The fact is that different languages can and do divide up concepts in different ways. In practice Wikidata generally treats English as the One True Language, and effectively screws over any foreign language that doesn't conform to English. For example:

There's a language, I think Polish(?), with a word for "children's playground ride, with a seat connected to a pivot point so it can move back and forth". In English we have one word for such a device in a vertical orientation - we call it a swing. We have a different word for such a device in a horizontal orientation - we call it a see saw. Both English articles need to link to the same foreign article, and the foreign article needs links to both English articles. (Preferably with some brief clarifying text attached.) Wikidata can't or won't do that.

It's known as the "Bonnie and Clyde" problem, because some Wikis have one article on Bonnie and Clyde as a famous couple, while other wikis have a separate biography for each person. However in my opinion that is an unhelpful and misleading name for the problem. It can lead people to think it's merely some issue with the Wikis, that it could fixed if the wikis simply agreed to work together and do the articles the same way. No no no. Languages do not always divide up concepts the same way. Languages can divide up colors differently - for example considering Blue and Green to merely be different shades of a single color. Colors shade smoothly into each other, and defining the borders of what constitutes a "different" color is a completely arbitrary human definition. Different languages can and do divide up concepts differently. It is abusive and broken to tell some other language that their concept-divisions are not equally legitimate, to tell them that they are somehow "wrong".

I called it painful because it will kinda suck to maintain both the existing Wikidata item links in addition to a new interwiki link system. But this has been a long term problem, and Wikidata can't or won't solve it.

P.S. Lectrician1 suggested "add a main subject (P921) statement [] sitelink to other language Wikipedia articles [with] the same main subject (P921) statement." That doesn't work, that is essentially the same functionality we have now with the same problem we have now. We need greater functionality. We need the English articles on Swing and See Saw to both link to the same Polish(?) article. And there are likely English articles that need to link to two (or more) Polish articles.

Lectrician1 (talkcontribs)

But it does work... If the English article on See Saw has "main subject: see saw", the English article on Swing has "main subject: swing", and the Polish article has "main subject: see saw, swing" or "main subject: pivoting playground device" (which see saw and swing are subclasses of), they should all link together.

I explain this in more-depth with an example of an article that combines topics on the Japanese Wikipedia here

Lectrician1 (talkcontribs)
Reply to "Use Multi-Content-Revisions to replace the sitelink system on Wikidata"

Responding to call for Feedback. We need stuff, but not this.

Alsee (talkcontribs)

I happen to be a programmer, so I understand better than most what you are trying to achieve, and why. However...

First, there is the issue of the labor to do all this work. Where is that supposed to come from? We are already struggling to keep up with our existing work. Our - new articles sitting unreviewed in draftspace waiting for possible promotion to mainspace article - is currently backlogged by over three months with 3,195 pending submissions. New users are waiting up to THREE MONTHS to find out whether their article will make it into the encyclopedia. currently has a backlog of nearly 7000 unreviewed pages in mainspace. Our page has literally millions of tasks open. That barely scratches the surface of our work. If you make this new system - if the community were to accept it - it means we would spend less time creating new articles. It means we would spend less time expanding and improving existing articles. It means we would spend less time cleaning up bias and propaganda and misinformation and other crap that gets into articles. It means we would have less time available to assist new users.

Second, look at your mockup images:

I have one question. Look at that and tell me, are you making the wiki simpler and easier and more inviting for new users? Or does this make the wiki vastly more complex, more difficult, more confusing and overwhelming for a new user?

Third, related to the above point, English Wikipedia is currently stalled in an unresolved conflict on whether we want to ban use of Wikidata-in-Wikipedia. Structured data is designed by programmers for programmers. I'm a programmer, I understand why you like structured data, and how it's better suited for computer consumption. We understand the selling points for structured data, but Wikipedia is made by people, for people. The two systems are fundamentally incompatible. The Foundation's push to serve computers, rather than to serve people, is making things more complicated and more difficult. A substantial portion of the EnWiki community is ready to ban this kind of thing as disruptive to our work. Wikipedia is a human and volunteer project that happens to use some technology, not the other way around.

P.S. There is so much that we do want and do need. The Foundation has not been devoting enough time to maintaining and improving our core platform, and the Community Tech team is badly understaffed. There are countless projects there that aren't getting done.

Sannita (WMF) (talkcontribs)

Hello @Alsee, and thank you for your thorough feedback!

I also wanted to let you know that the proposal you read on the project page is actually a bit outdated - we've heard similar feedback in the past from other users, and so we are in the process of working on a revision to the plan described that will take all of your feedback into account.

We planned on updating the page once we were done with the new plan, but in order to avoid further misunderstandings, I will put an advice on the relevant section right now to mark it as "outdated". Thanks again for your time, and we look forward for you to review our new plan in the future!

Reply to "Responding to call for Feedback. We need stuff, but not this."
Martinvl (talkcontribs)

When I first saw the MediaWIki announcement, my thoughts went over to weather charts. I foresaw a two-level product. The top level woudl be the general code for a weather chart and the lower level would be charts for specific localities. Thus, if I am using a small Wikipedia, I would ensure that the upper termplate was compatible with my language and I could then call up the weather chart with a simple call such as {{weather|la|Rome}} would include the weather chart for Rome in latin (provided of course that someone had generated the Rome chart in the first place).

Was I on the right track here?

Sannita (WMF) (talkcontribs)

Hi @Martinvl! Unfortunately, I don't think we can help you there. Your idea is good, if you ask me, but it's not in the scope of this project. What you're planning is something more along the lines of a combination of Wikidata + the Global templates initiative, but SDAW would not do any good to your proposal. It would help on the fringes of your idea, maybe. Sorry for the bummer. :(

Martinvl (talkcontribs)

@Sannita (WMF): : Thank you for yoru reply. I feared as much, but I felt that it was worth a try. Martinvl (talk) 16:53, 29 December 2021 (UTC)

Reply to "Weather charts"

Your expectations about the project

Sannita (WMF) (talkcontribs)

This thread will address two themes about the general expectations about the project and the way it will impact on users' workflows:

  1. What do users expect from this project? What are the necessary actions to be addressed?
  2. How do you envision this metadata being used? Can you think of ways it would aid in your workflows?
Oursana (talkcontribs)

See above, the project doesn't trigger me. I can imagine support with wikilinks. On wd one gets suggestions for properties, qualifiers, values etc which is very helpful. So than I would like on wp a suggestion for e.g. birthdate and reference...From the example I do not get where you want to go. Perhaps I would like more examples. We need more data on wd and editing there could even be easier

Lucas Werkmeister (talkcontribs)

I don’t know what I expect from this project. The description feels pretty vague to me – I don’t really have an idea what kind of metadata or structured data could be collected. The first action area, tagging sections, makes some sense, but I don’t think anything is stopping interested Wikipedias from implementing something fairly similar today, using RDFa:

== Oldest living organisms ==
<p about="wd:Q590039">
Bristlecone pines are known for attaining great ages. The oldest bristlecone pine in the White Mountains is <span property="schema:name">[[Methuselah]]</span>, which has a verified age of 4,852 years. It is located in the <span property="schema:location">[[Inyo National Forest]]</span> in Eastern California. The specific location of Methuselah is a very closely guarded secret.

(Based on Bristlecone pine, Wikipedia contributors, CC BY-SA 3.0.)

CParle (WMF) (talkcontribs)

AIUI initially the metadata will simply be a list of topics associated with a section, the idea being that it's a step on the road to being able to search for and serve smaller pieces of content than a whole article

Interested wikipedias can do what you describe, indeed, but it's pretty awkward

Framawiki (talkcontribs)

Hi, I'm interested about the result of the work will be stored. Would it be in a separate database on a project' server, or directly somewhere in wikis?

Perhaps it may be interesting to serve the results linked in wikidata objects. Looking at the mockup image, we can imagine properties (bold items with icons) and values (q items). Thanks!

CParle (WMF) (talkcontribs)

We're planning to store the topical metadata in a "slot" on the page - similar to how we store structured data for images on commons. Ultimately it'll also be injected into elasticsearch, but we haven't figured out the details of that yet

Billinghurst (talkcontribs)

Obviously I am wanting Wikisource reproductions to be better findable, both internally and externally. Better and more easily usable.

I am hoping that the project will be able to tell me what is possible for my wishes for an integrated ability to interact and inject pertinent detail crosswki. in the easiest, most reliable, and referenced form.


Ultimately I am also hoping for better input/workflow tools to make things easier and better, and to stop having to add the same data at multiple places.

  • Creating a commons file
  • Creating the wikisource components, transcluding, then creating the WD item for the edition, then needing to create the work item for the parent
  • Then having to update the Commons file to link back to the WD item
  • then edit the WD item to add the IA details
  • Then I head over to Wikipedia to then find the right template to re-add the data parameter by parameter, when I should either be able to just add a contextual template or use a lookup form or a template add the WD item, and just be able to have the data inhaled and added.

(all the around and around and around)

At the moment I use the WEF framework tool though while it is pretty good, it doesn't completely suit needs at creating/editing items from enWS. I would love to be able to have a customisable tool to create/edit WD from the sister wikis rather than having to go to WD and manually edit the item. Similarly FROM something like s:en:A catalogue of notable Middle Templars, with brief biographical notices/Bayley, Sir Edward Clive I want to be able to directly apply to the related main subject's item (not its own item) the data able to be pushed in and referenced.

No idea exactly how much of that is in scope, but that is what I hope we can think about, or poke somewhere as a way to interlink and overlay data between the sisters.  :-)

Hjfocs (talkcontribs)

Wow, my first thoughts are:

  1. the Wikidata vision is coming true! Structured knowledge to serve as the backbone of the Wikimedia landscape;
  2. sounds like a more fine-grained for the Wikimedia ecosystem.

Besides automatic enrichment of Wikipedia infoboxes, I fully support the categorization of page sections: one clear expectation or outcome is to see rich snippets of Wikipedia articles when I run a search, ideally regardless of the engine I use.

I'm not sure how the project would cater for the Abstract Wikipedia, though, so I'd like to hear more on this.

Hjfocs (talkcontribs)

I see that @Lucas Werkmeister actually explains how the page sections effort could also be implemented following the fashion with RDFa markup.

Belteshassar (talkcontribs)

Unlike most files on Commons or the concepts described by Wikidata items, Wikipedia articles evolve with time and, depending on the implementation, metadata could fall out of sync with the sections. I’m sure there are ways to address this, for example by automatically identifying heavily edited sections.

Eposthumus (talkcontribs)

How do we query the Structured Data on Wikimedia Commons?

Are there plans to provide something similar to the WDQS on Wikidata, or does the structured data eventualy also filter through to Wikidata? (if my understanding that this is actually separate datastores is correct)

BTW, I find the idea of structured data on Commons very exciting, and plan on adding more support and integration for the ICONCLASS classification system. Similar to what is already done for Wikidata.

Sannita (WMF) (talkcontribs)

Hello @Eposthumus, there is a plan for a separate query service for Structured Data on Commons. The plan will be announced soon™, as we the team is still reviewing some aspects. I'll be sure to ping you personally when the announcement will go live.

Personally speaking, I share your excitement and your expectations, but it seems we'll need to wait a bit longer. :)

Reply to "Your expectations about the project"
Sannita (WMF) (talkcontribs)

This thread will address the theme of metadata moderation:

  1. In your opinion, is moderation necessary to avoid vandalism and/or bias?
  2. If moderation is necessary, how can it be effectively managed?
Nintendofan885 (talkcontribs)

metadata moderation would probably be needed (at least in it's early stages) given that many Wikipedia editors don't know how to edit Wikidata which leads to vandalism being left behind so a higher profile structured data would likely have the same problem.

Reply to "Metadata moderation"
Andy Dingley (talkcontribs)

Wikidata has "issues" (or has failed, depending on your viewpoint) in part because it has no schema. This makes its data effectively unprocessable by automated tools, at least those with any embedded intelligence.

Will this project take the same route?

CBogen (WMF) (talkcontribs)

As mentioned above, this project will use Wikidata items. Therefore any issues or improvements to the Wikidata schema would be reflected here.

Reply to "The need for schema"
Andy Dingley (talkcontribs)

Don't we have this already? What has happened to Wikidata?

What is Wikidata failing to do? Why is this needed?

What will SDAW provide to solve that?

CBogen (WMF) (talkcontribs)

The SDAW section topics will use Wikidata items to describe sections of text in Wikipedia articles. There is no intent to replace or supplement Wikidata. Whereas we already have Wikidata items attached to articles at the highest level, this will attach them to sections of text.

Reply to "Why a new project?"

Inform the communities

Hogü-456 (talkcontribs)


I think it is important that you discuss structured data for Wikimedia further in a language version before you introduce it in a language version. I dont know how high the openess for that topic is in for example the German Wikipedia. Can you please tell me what the distinction of this concept to a system using categories. From my point of view this is an important question after there is only acceptance if there is an understanding of the advantages. I am interested in metadata and I think it is good if this is possible to add it in Wikipages, what is exists currenctly with different Templates and I prefer a concept where it is possible to write the Metadata in Wikitext and I wish that there is also a tool that helps creating metadata for pages and what uses a form. So I think here about a kind of contenct creator. The Wikidataqueryservice is an example of a interface where it is possible to type queries and to search for a property in the text if the functionality is known without using a form for search. This is something what I like.

Sannita (WMF) (talkcontribs)

Hi @Hogü-456:, thank you very much for your comment! I'm sorry to reply to you this late, but we've been busy planning an initiative for reaching out to communities in order to get feedback. In a way, this answers your first request: actually we want the communities to be not just informed, but to take an active part in our project, to help us shaping it better.

About the metadata, as @CParle (WMF): said in another thread at the moment «We're planning to store the topical metadata in a "slot" on the page - similar to how we store structured data for images on Commons».

This system is not going to replace the category system: like Structured Data on Commons, this is going to add another layer to wikipages, in order to make them more easily findable and to help user associate contents between Wikimedia projects (i.e. unused images that can illustrate articles).

If you have more questions or suggestions, please let us know!

Hogü-456 (talkcontribs)

Hello @Sannita (WMF) what I thought with informing the communities was that you enter a information about what you plan at pages of the language versions like German Wikipedia. Not all people read pages on Metawiki or Mediawiki and so you reach more people if you go to the language versions. I havent read something about this project in the central german talk page called Kurier. Please think about writing texts in the different languages of the language versions and publish them directly in the language versions.

Sannita (WMF) (talkcontribs)

Hi @Hogü-456:, of course we will target the communities in their pages, we still didn't start yet with our outreach activities. It's a matter of days though.

About translations, we're still working on it, but we'll try our best to reach out as much as possible in the local languages. Also, feedback from users can be also expressed in native language, if you feel more comfortable. And of course, any help in translation and reaching out is welcome. :)

Reply to "Inform the communities"
ChristianKl (talkcontribs)

The mockup shows an editing interface but it doesn't say anything about how the structure of the underlying metadata. In previous discussions about the Visual Editor, Flow and Structured discussion there was the sentiment that it's important that everything is editable as Wikitext. That leaves the question:

  1. Is it important that Structured data on Wikimedia is editable as Wikitext?
  2. If so how would it be best structured?
Sannita (WMF) (talkcontribs)

Hi Christian, thank you for bringing this up! Sorry for the delay in answering, but I wanted to be sure of the technical details of my answer (which won't be short, so apologies for this too).

So, we did consider adding the section metadata as editable wikitext, but ultimately we think there are a lot of potential difficulties with that, and that on the contrary we would have many advantages in keeping it separate.

Conceptually, separating metadata about a document or text from the document itself makes sense. It also allows machines to read and understand it separately. This is the way we did it with Structured Data on Commons, via Extension:WikibaseMediaInfo. It has several advantages, such as:

  • the ability to refine user permissions for accessing and editing this type of data, and
  • reducing cognitive burden on editors who are not interested in working with this data (as we said in the main page, we don't want to "overwhelm users with too much new content to moderate").

We would also like to try not to complicate wikitext further, by adding additional markup from a technical perspective, because this would create more maintenance burden on a critical piece of the MediaWiki environment (again, we don't want to "introduce too much complexity into our systems").

Also, in the end, information in wikitext is not fully structured data that can be consistently read and understood by machines. We want to ensure that the metadata created by this project will be accessible in a robust, consistent, structured and linked format.

Anyway, I would like to hear more about your reasons why you think it is important that everything be editable as wikitext. Perhaps we can think of other ways to solve those problems.

Lucas Werkmeister (talkcontribs)

I’m not convinced by this yet. If the metadata isn’t tied in with the wikitext, how will the connection between the two be established? It seems to me that it would break each time a heading is renamed in the wikitext.

CParle (WMF) (talkcontribs)

Indeed it would. As far as I know the Platform team is looking into creating stable identifiers for sections, which will allow us to properly associate metadata with a section ... it is possible, however, that the initial PoC will just use the section title text

Lokal Profil (talkcontribs)
Sannita (WMF) (talkcontribs)

Hi Lokal, I'm not sure it would introduce that service as it is described, but I think that would probably turn into a use case for our project, so thanks for bringing this up! Do you have any additional insight or suggestion about this that you want to share with us?

Lokal Profil (talkcontribs)

I think most of it is in the notes for that meeting. Not sure whether @Cscott has looked at this further since then. @Sebastian Berlin (WMSE) might be able to point you to some of the early notes from m:Wikispeech about which changes we determined should invalidate our planned annotations.

This post was hidden by Sannita (WMF) (history)
Cscott (talkcontribs)

We're working on annotations (of a sort) in the process of implementing support for the Translate extension in Parsoid, so we should probably touch base about unifying the various efforts.

Reply to "Wikitext"

New proposal for topic metadata

Sannita (WMF) (talkcontribs)

Coming from your feedback in this page, we are re-evaluating the possibilities for human intervention in managing section topic metadata.

Our first proposal relied on humans to approve or reject a given set of machine-generated section metadata, but this can easily become too much of a burden for users (especially in small communities).

So our current proposal is to instead insert the machine-generated topical metadata in Wikipedia article's sections, relying on blue interwiki links and concept relationships, without requiring users to accept or reject that metadata - but leaving the possibility to users to revise it (by adding new concepts or deleting machine-generated concepts). This would result in a much faster way of populating section metadata and in a lesser burden for users.

Here you can visualise the differences between the two proposals. What is your take about this? Do you consider it feasible?

Reply to "New proposal for topic metadata"
Return to "Structured Data Across Wikimedia" page.