Talk:Structured Data Across Wikimedia

Your expectations about the project

edit

This thread will address two themes about the general expectations about the project and the way it will impact on users' workflows:

  1. What do users expect from this project? What are the necessary actions to be addressed?
  2. How do you envision this metadata being used? Can you think of ways it would aid in your workflows? Sannita (WMF) (talk) 22:21, 1 March 2021 (UTC)Reply
See above, the project doesn't trigger me. I can imagine support with wikilinks. On wd one gets suggestions for properties, qualifiers, values etc which is very helpful. So than I would like on wp a suggestion for e.g. birthdate and reference...From the example I do not get where you want to go. Perhaps I would like more examples. We need more data on wd and editing there could even be easier Oursana (talk) 20:30, 20 March 2021 (UTC)Reply
I don’t know what I expect from this project. The description feels pretty vague to me – I don’t really have an idea what kind of metadata or structured data could be collected. The first action area, tagging sections, makes some sense, but I don’t think anything is stopping interested Wikipedias from implementing something fairly similar today, using RDFa:
== Oldest living organisms ==
<p about="wd:Q590039">
Bristlecone pines are known for attaining great ages. The oldest bristlecone pine in the White Mountains is <span property="schema:name">[[Methuselah]]</span>, which has a verified age of 4,852 years. It is located in the <span property="schema:location">[[Inyo National Forest]]</span> in Eastern California. The specific location of Methuselah is a very closely guarded secret.
</p>
(Based on Bristlecone pine, Wikipedia contributors, CC BY-SA 3.0.) Lucas Werkmeister (talk) 16:42, 21 March 2021 (UTC)Reply
AIUI initially the metadata will simply be a list of topics associated with a section, the idea being that it's a step on the road to being able to search for and serve smaller pieces of content than a whole article
Interested wikipedias can do what you describe, indeed, but it's pretty awkward CParle (WMF) (talk) 10:18, 24 March 2021 (UTC)Reply
Hi, I'm interested about the result of the work will be stored. Would it be in a separate database on a project' server, or directly somewhere in wikis?
Perhaps it may be interesting to serve the results linked in wikidata objects. Looking at the mockup image, we can imagine properties (bold items with icons) and values (q items). Thanks! Framawiki (talk) 07:14, 1 April 2021 (UTC)Reply
We're planning to store the topical metadata in a "slot" on the page - similar to how we store structured data for images on commons. Ultimately it'll also be injected into elasticsearch, but we haven't figured out the details of that yet CParle (WMF) (talk) 09:06, 2 April 2021 (UTC)Reply
Obviously I am wanting Wikisource reproductions to be better findable, both internally and externally. Better and more easily usable.
I am hoping that the project will be able to tell me what is possible for my wishes for an integrated ability to interact and inject pertinent detail crosswki. in the easiest, most reliable, and referenced form.
...
Ultimately I am also hoping for better input/workflow tools to make things easier and better, and to stop having to add the same data at multiple places.
  • Creating a commons file
  • Creating the wikisource components, transcluding, then creating the WD item for the edition, then needing to create the work item for the parent
  • Then having to update the Commons file to link back to the WD item
  • then edit the WD item to add the IA details
  • Then I head over to Wikipedia to then find the right template to re-add the data parameter by parameter, when I should either be able to just add a contextual template or use a lookup form or a template add the WD item, and just be able to have the data inhaled and added.
(all the around and around and around)
At the moment I use the WEF framework tool though while it is pretty good, it doesn't completely suit needs at creating/editing items from enWS. I would love to be able to have a customisable tool to create/edit WD from the sister wikis rather than having to go to WD and manually edit the item. Similarly FROM something like s:en:A catalogue of notable Middle Templars, with brief biographical notices/Bayley, Sir Edward Clive I want to be able to directly apply to the related main subject's item (not its own item) the data able to be pushed in and referenced.
No idea exactly how much of that is in scope, but that is what I hope we can think about, or poke somewhere as a way to interlink and overlay data between the sisters.  :-) — billinghurst sDrewth 00:19, 15 April 2021 (UTC)Reply
Wow, my first thoughts are:
  1. the Wikidata vision is coming true! Structured knowledge to serve as the backbone of the Wikimedia landscape;
  2. sounds like a more fine-grained https://schema.org/ for the Wikimedia ecosystem.
Besides automatic enrichment of Wikipedia infoboxes, I fully support the categorization of page sections: one clear expectation or outcome is to see rich snippets of Wikipedia articles when I run a search, ideally regardless of the engine I use.
I'm not sure how the project would cater for the Abstract Wikipedia, though, so I'd like to hear more on this. Hjfocs (talk) 17:24, 4 May 2021 (UTC)Reply
I see that @Lucas Werkmeister actually explains how the page sections effort could also be implemented following the schema.org fashion with RDFa markup. Hjfocs (talk) 17:29, 4 May 2021 (UTC)Reply
Unlike most files on Commons or the concepts described by Wikidata items, Wikipedia articles evolve with time and, depending on the implementation, metadata could fall out of sync with the sections. I’m sure there are ways to address this, for example by automatically identifying heavily edited sections. Belteshassar (talk) 20:44, 5 August 2021 (UTC)Reply
How do we query the Structured Data on Wikimedia Commons?
Are there plans to provide something similar to the WDQS on Wikidata, or does the structured data eventualy also filter through to Wikidata? (if my understanding that this is actually separate datastores is correct)
BTW, I find the idea of structured data on Commons very exciting, and plan on adding more support and integration for the ICONCLASS classification system. Similar to what is already done for Wikidata. Eposthumus (talk) 12:54, 15 December 2021 (UTC)Reply
Hello @Eposthumus, there is a plan for a separate query service for Structured Data on Commons. The plan will be announced soon™, as we the team is still reviewing some aspects. I'll be sure to ping you personally when the announcement will go live.
Personally speaking, I share your excitement and your expectations, but it seems we'll need to wait a bit longer. :) Sannita (WMF) (talk) 13:38, 15 December 2021 (UTC)Reply

Metadata moderation

edit

This thread will address the theme of metadata moderation:

  1. In your opinion, is moderation necessary to avoid vandalism and/or bias?
  2. If moderation is necessary, how can it be effectively managed? Sannita (WMF) (talk) 22:29, 1 March 2021 (UTC)Reply
metadata moderation would probably be needed (at least in it's early stages) given that many Wikipedia editors don't know how to edit Wikidata which leads to vandalism being left behind so a higher profile structured data would likely have the same problem. Nintendofan885 (talk) 17:47, 20 March 2021 (UTC)Reply

Adding and confirming metadata

edit

This thread will address the theme of the basic functions related to topical metadata:

  1. Do users want to be able to approve or reject metadata suggested by the automated system?
  2. Do users want to be able to add additional metadata beyond what is suggested by the automated system?
  3. Do you think it may just be sufficient for users to have the opportunity to send feedback with suggestions on how to improve the machine generated metadata, when necessary? Sannita (WMF) (talk) 22:41, 1 March 2021 (UTC)Reply
1)2) strong yes
3)noooo Oursana (talk) 20:16, 20 March 2021 (UTC)Reply
Dropping my two cents here.
From some lessons learnt on data curation tools like Mix'n'match (Q28054658) and Primary Sources Tool (Q20656106), my general (and very personal) feeling is: we are talking about a tedious task, so the less you ask people to approve or reject suggestions, the better.
As mentioned in Structured_data_across_Wikimedia#What_is_changing, I totally agree that we should pick blue Wikipedia links as the low-hanging fruits; I'd also be confident enough to consider corresponding Wikidata items as the ground truth.
In conclusion: why bother about metadata curation at all? Let's take those links for granted! They are already the outcome of human love.
Instead, I'd focus on automatically coming up with a summarized representation, read one high-level Wikidata item that best categorizes a given Wikipedia section, read topic categorization. Hjfocs (talk) 16:57, 4 May 2021 (UTC)Reply

Privileges for visualising and editing

edit

This thread will address the theme of the privileges related to topical metadata:

  1. Do we want metadata to be visible for all users or only for certain classes of users?
  2. Do we want metadata to be editable for all users or only for certain classes of users? Sannita (WMF) (talk) 22:42, 1 March 2021 (UTC)Reply
1) I do not see any advantages of visualising only for certain users
2) depending on the risks of mistakes and vandalism one could use a two step system for unexperienced users as we use in de:wp, so editable but approval needed until a certain amount of edits is reached. Oursana (talk) 20:14, 20 March 2021 (UTC)Reply

Wikitext

edit

The mockup shows an editing interface but it doesn't say anything about how the structure of the underlying metadata. In previous discussions about the Visual Editor, Flow and Structured discussion there was the sentiment that it's important that everything is editable as Wikitext. That leaves the question:

  1. Is it important that Structured data on Wikimedia is editable as Wikitext?
  2. If so how would it be best structured? ChristianKl (talk) 17:51, 5 March 2021 (UTC)Reply
Hi Christian, thank you for bringing this up! Sorry for the delay in answering, but I wanted to be sure of the technical details of my answer (which won't be short, so apologies for this too).
So, we did consider adding the section metadata as editable wikitext, but ultimately we think there are a lot of potential difficulties with that, and that on the contrary we would have many advantages in keeping it separate.
Conceptually, separating metadata about a document or text from the document itself makes sense. It also allows machines to read and understand it separately. This is the way we did it with Structured Data on Commons, via Extension:WikibaseMediaInfo. It has several advantages, such as:
  • the ability to refine user permissions for accessing and editing this type of data, and
  • reducing cognitive burden on editors who are not interested in working with this data (as we said in the main page, we don't want to "overwhelm users with too much new content to moderate").
We would also like to try not to complicate wikitext further, by adding additional markup from a technical perspective, because this would create more maintenance burden on a critical piece of the MediaWiki environment (again, we don't want to "introduce too much complexity into our systems").
Also, in the end, information in wikitext is not fully structured data that can be consistently read and understood by machines. We want to ensure that the metadata created by this project will be accessible in a robust, consistent, structured and linked format.
Anyway, I would like to hear more about your reasons why you think it is important that everything be editable as wikitext. Perhaps we can think of other ways to solve those problems. Sannita (WMF) (talk) 14:50, 9 March 2021 (UTC)Reply
I’m not convinced by this yet. If the metadata isn’t tied in with the wikitext, how will the connection between the two be established? It seems to me that it would break each time a heading is renamed in the wikitext. Lucas Werkmeister (talk) 16:26, 21 March 2021 (UTC)Reply
Indeed it would. As far as I know the Platform team is looking into creating stable identifiers for sections, which will allow us to properly associate metadata with a section ... it is possible, however, that the initial PoC will just use the section title text CParle (WMF) (talk) 10:07, 24 March 2021 (UTC)Reply
Would this introduce the long talked about annotation service? Lokal_Profil (talk) 22:45, 22 March 2021 (UTC)Reply
Hi Lokal, I'm not sure it would introduce that service as it is described, but I think that would probably turn into a use case for our project, so thanks for bringing this up! Do you have any additional insight or suggestion about this that you want to share with us? Sannita (WMF) (talk) 15:29, 24 March 2021 (UTC)Reply
I think most of it is in the notes for that meeting. Not sure whether @Cscott has looked at this further since then. @Sebastian Berlin (WMSE) might be able to point you to some of the early notes from m:Wikispeech about which changes we determined should invalidate our planned annotations. Lokal_Profil (talk) 20:59, 16 April 2021 (UTC)Reply
We're working on annotations (of a sort) in the process of implementing support for the Translate extension in Parsoid, so we should probably touch base about unifying the various efforts. cscott (talk) 15:07, 28 July 2021 (UTC)Reply

Some ideas

edit
Hi, I just run into this effort and I am still trying to understand it. I had some thoughts which might or might not be relevant:
  1. I spend a lot of time on Commons on file level infoboxes like c:template:Information, c:template:Artwork, or c:template:Book and the whole ecosystem of helper templates used for internationalization. A lot of those templates use invisible, language-independent messaging system so that templates like c:template:Artwork do know a lot about the content of the values provided through various fields and they can allow people to quickly copy data from commons to wikidata, search wikidata for matching items or create new wikidata items. This system of machine-readable metadata passing could be useful in your project, either to tap into and use something similar elsewhere.
  2. I was looking very closely at html level tags and machine readable data added to images on Commons, by templates like c:template:Information or c:template:En, c:template:It, etc. If anybody has any ideas on how to improve those, I would be happy to talk. Jarekt (talk) 18:43, 5 March 2021 (UTC)Reply
Noting that the Book template is utilised by some of the Wikisources to populate the creation of Index: namespace pages which are the basis of proofreading transcriptions. There is no ready ability to sit at Commons and push all that edition data easily into a Wikidata item. Noting that the File: at Commons does not have a direct interface with Commons item beyond a link back to Commons as a document file. See https://www.wikidata.org/wiki/Q105102792 for an example and the "wikidata" parameter at c:File:A catalogue of notable Middle Templars, with brief biographical notices.djvu needs to be manually added after item creation. — billinghurst sDrewth 01:59, 15 April 2021 (UTC)Reply
I really like the idea of the project! I have a specific use case in mind personally.
The Wikidocumentaries project (currently not actively in development) aims eventually to create narratives from the kind of building blocks you describe. Often such a building block equals a section in a Wikipedia article. When such a block is made independent of the article and allowed to be displayed more visually than textually, it will be possible to create new audiovisual narratives based on the reconfigured article blocks. I have documented some vague ideas to use for brainstorming further https://wikidocumentaries.wmflabs.org/wiki/Wikidocumentary
I also see that this is crucially important for enabling translating existing articles across languages, and being able bridge across content in different languages. Susannaanas (talk) 13:47, 8 March 2021 (UTC)Reply
Hey Jarekt and Susanna, thank you very much for your comments! Your feedback is really appreciated.
I'll pass Jarekt considerations to the technical team, so that they can take them into consideration. Susanna, I think Wikidocumentaries' "candidature" as a use case is more than welcomed! I'll keep you both posted about any news, as soon as they happen. Sannita (WMF) (talk) 14:56, 9 March 2021 (UTC)Reply
as to Jarekt
as I can copy data from commons to wikidata, it would be helpful to copy data (birthdates, technique, dimensions, institution of an artwork, image, husband, father, university, study, profession, residence, state) to wp. Perhaps we do not so much need suggestions but possibilities for those who want to connect and copy information to other projects as well as the exchange between language versions. Family trees could be used easily. Oursana (talk) 20:42, 20 March 2021 (UTC)Reply
We were doing automatic copying for years. Try it: Go to c:Category:Artworks with Wikidata item: quick statements open any image and in the artwork title bar click Up-arrow with the wikidata logo. That will open QuickStatements preconfigured for copying technique, dimensions, institution of an artwork. Jarekt (talk) 01:54, 21 March 2021 (UTC)Reply
yes I know and do regularly. Therefore my ideas were were to copy these informations to wikipedia language versions as well, at least to get the information to edit on one's own Oursana (talk) 02:09, 21 March 2021 (UTC)Reply
Wikipedia infoboxes could do it as well. The problem I see is that there is some very vocal minority on English Wikidata, actively sabotaging any attempts to work with wikidata, making it a very toxic place to contribute to if you try to work with wikidata. Jarekt (talk) 03:45, 21 March 2021 (UTC)Reply
I think even worse on German Wikipedia. This is what Multichill mentions further below. Even for commons users there is a problem using wikidata Oursana (talk) 03:51, 21 March 2021 (UTC)Reply

Who wants this?

edit

I have a very bad feeling with this project. The current proposal is extremely fuzzy, fluffy and broad. What is the user story here? Is this focused on editors or readers? At least with structured data on Commons we had a community need (like Commons:User:Multichill/Next generation categories).

To me this feels like you asked for a follow up grant for Structured data on Commons and the only way you would get it is to do something with Wikipedia and this is the result of it. The Commons community hates the computer aided junk that is being added. If you try that kind of things with the bigger language Wikipedia's, get ready to get your head bitten off.

Why are you starting a new project when have barely started with structured data on Commons and you already a hard time getting the right people to work on that? Structured data on Commons is far from done, it's currently in the state of barely functional prototype. Why take on another project that will water down your focus even more? Multichill (talk) 18:13, 7 March 2021 (UTC)Reply

Hi Multichill, thanks for your feedback. As for the current proposal, the first potential use cases will be to improve search on Wikipedias, in a similar way to MediaSearch for Commons, and to improve image recommendations. Susanna has also brought up some very promising potential use cases.
We are hoping to gather more like this, and have others in mind ourselves, but we mostly want to hear from users at this stage. We do not plan on doing anything or placing unwanted additional burdens that the community doesn't want. Sannita (WMF) (talk) 20:54, 8 March 2021 (UTC)Reply
I am afraid Multichill is not wrong, especially "Structured data on Commons is far from done", so it is not a good moment of further steps on wikipedia, especially because wikipedians are even more suspicious with data than commons users. To find images the sorting functions within cats on commons should be easier, and that also could be supported not via title data but in recognizing the similarity of the photographed subjects. We also need a function like cat a lot on commons to change a better image of e.g. 40 or more files on wp language versions, doing this by hand is not encouraging editors Oursana (talk) 20:07, 20 March 2021 (UTC)Reply

Do we need that?

edit

Do we need that or does the Wikidata community need that? Juandev (talk) 09:37, 21 March 2021 (UTC)Reply

Hi Juandev! We hope to explore the functionalities of using structured metadata to improve search on Wikipedias as our first case, like we did for MediaSearch on Commons. We also hope to use it to improve image recommendations, by allowing users to automatically find images that belong in a given section.
Other users here have also brought up some promising use cases, such as those mentioned by Susanna (with her project Wikidocumentaries) and Lokal (with the "annotation service" proposal). We also hope that this metadata will help external re-users maximize the ways in which they can make Wikipedia information available to more people.
Again, we do not plan on doing anything or placing unwanted additional burdens that the community doesn't want. Sannita (WMF) (talk) 15:42, 24 March 2021 (UTC)Reply
Thanks for clarifying, it wasnt so obvious to me from the text. So is there an intention of Wikimedia Foundation to improve search? Years ago WMF was stating we are not search engine and we wont do that. But these effots looks WMF changed mind. Which would be only good since the algorythm of Google hides several usefull pages. Juandev (talk) 20:06, 11 April 2021 (UTC)Reply
There is some playing ... https://global-search.toolforge.org/billinghurst sDrewth 23:55, 14 April 2021 (UTC)Reply
We do not intend to dramatically change the plans stated years ago by the WMF for search. However, we do think that structured data might help improve search in some ways. One example is the new MediaSearch interface on Commons, which uses the Structured Data that describes images on Commons to improve search results.
There is potential for something similar to work beyond Commons, although we have a lot of work to do to determine whether that is the right direction to go in. For this project, the goal is to experiment with the structured data we create to see if it does make worthwhile improvements to search, and make decisions about next steps based on those experiments. CBogen (WMF) (talk) 22:20, 14 April 2021 (UTC)Reply

Data mining

edit

Time to time we do some data mining in Wikipedia to evaluate our projects. But as far of now, we mostly count it via Excell tabs, sometimes Petscan help. I wonder if structurizing Wikipedia data may help us to do it via easier more automate way. Juandev (talk) 20:07, 11 April 2021 (UTC)Reply

We definitely think that structuring the Wikipedia data can help with data mining, evaluation and statistics! If you look at the Potential Use Cases, “better statistics” is noted as one way we think we could potentially use this data.
We’d love to hear more about how you think you could use topical metadata on sections to improve data mining. What do you envision using that data for? What types of things are you currently counting via Excel and Petscan that we might be able to automate in an easier way? CBogen (WMF) (talk) 22:19, 14 April 2021 (UTC)Reply

Structured templates

edit

On Czech Wikipedia there is a Wikiproject on improving templates with structured data, that they are easier for use in VisualEditor. But there are 15k templates which is an enormous task. More over some templates does not have documentations, so we dont know how to describe certain lines. Would this project somehow help? Juandev (talk) 20:09, 11 April 2021 (UTC)Reply

This project does not currently aim to address templates directly, but it’s definitely interesting to think about how structured data could help improve templates. Do you have ideas about how topical metadata that describes individual sections of articles might help solve the problem you’re describing? CBogen (WMF) (talk) 22:18, 14 April 2021 (UTC)Reply

New proposal for topic metadata

edit

Coming from your feedback in this page, we are re-evaluating the possibilities for human intervention in managing section topic metadata.

Our first proposal relied on humans to approve or reject a given set of machine-generated section metadata, but this can easily become too much of a burden for users (especially in small communities).

So our current proposal is to instead insert the machine-generated topical metadata in Wikipedia article's sections, relying on blue interwiki links and concept relationships, without requiring users to accept or reject that metadata - but leaving the possibility to users to revise it (by adding new concepts or deleting machine-generated concepts). This would result in a much faster way of populating section metadata and in a lesser burden for users.

Here you can visualise the differences between the two proposals. What is your take about this? Do you consider it feasible? Sannita (WMF) (talk) 17:40, 24 May 2021 (UTC)Reply

Inform the communities

edit

Hello,

I think it is important that you discuss structured data for Wikimedia further in a language version before you introduce it in a language version. I dont know how high the openess for that topic is in for example the German Wikipedia. Can you please tell me what the distinction of this concept to a system using categories. From my point of view this is an important question after there is only acceptance if there is an understanding of the advantages. I am interested in metadata and I think it is good if this is possible to add it in Wikipages, what is exists currenctly with different Templates and I prefer a concept where it is possible to write the Metadata in Wikitext and I wish that there is also a tool that helps creating metadata for pages and what uses a form. So I think here about a kind of contenct creator. The Wikidataqueryservice is an example of a interface where it is possible to type queries and to search for a property in the text if the functionality is known without using a form for search. This is something what I like. Hogü-456 (talk) 17:49, 16 July 2021 (UTC)Reply

Hi @Hogü-456: , thank you very much for your comment! I'm sorry to reply to you this late, but we've been busy planning an initiative for reaching out to communities in order to get feedback. In a way, this answers your first request: actually we want the communities to be not just informed, but to take an active part in our project, to help us shaping it better.
About the metadata, as @CParle (WMF): said in another thread at the moment «We're planning to store the topical metadata in a "slot" on the page - similar to how we store structured data for images on Commons».
This system is not going to replace the category system: like Structured Data on Commons, this is going to add another layer to wikipages, in order to make them more easily findable and to help user associate contents between Wikimedia projects (i.e. unused images that can illustrate articles).
If you have more questions or suggestions, please let us know! Sannita (WMF) (talk) 15:41, 28 July 2021 (UTC)Reply
Hello @Sannita (WMF) what I thought with informing the communities was that you enter a information about what you plan at pages of the language versions like German Wikipedia. Not all people read pages on Metawiki or Mediawiki and so you reach more people if you go to the language versions. I havent read something about this project in the central german talk page called Kurier. Please think about writing texts in the different languages of the language versions and publish them directly in the language versions. Hogü-456 (talk) 19:12, 28 July 2021 (UTC)Reply
Hi @Hogü-456: , of course we will target the communities in their pages, we still didn't start yet with our outreach activities. It's a matter of days though.
About translations, we're still working on it, but we'll try our best to reach out as much as possible in the local languages. Also, feedback from users can be also expressed in native language, if you feel more comfortable. And of course, any help in translation and reaching out is welcome. :) Sannita (WMF) (talk) 15:48, 29 July 2021 (UTC)Reply

Why a new project?

edit

Don't we have this already? What has happened to Wikidata?


What is Wikidata failing to do? Why is this needed?


What will SDAW provide to solve that? Andy Dingley (talk) 10:56, 2 August 2021 (UTC)Reply

The SDAW section topics will use Wikidata items to describe sections of text in Wikipedia articles. There is no intent to replace or supplement Wikidata. Whereas we already have Wikidata items attached to articles at the highest level, this will attach them to sections of text. CBogen (WMF) (talk) 13:51, 2 August 2021 (UTC)Reply

The need for schema

edit

Wikidata has "issues" (or has failed, depending on your viewpoint) in part because it has no schema. This makes its data effectively unprocessable by automated tools, at least those with any embedded intelligence.


Will this project take the same route? Andy Dingley (talk) 10:57, 2 August 2021 (UTC)Reply

As mentioned above, this project will use Wikidata items. Therefore any issues or improvements to the Wikidata schema would be reflected here. CBogen (WMF) (talk) 13:52, 2 August 2021 (UTC)Reply

Weather charts

edit

When I first saw the MediaWIki announcement, my thoughts went over to weather charts. I foresaw a two-level product. The top level woudl be the general code for a weather chart and the lower level would be charts for specific localities. Thus, if I am using a small Wikipedia, I would ensure that the upper termplate was compatible with my language and I could then call up the weather chart with a simple call such as {{weather|la|Rome}} would include the weather chart for Rome in latin (provided of course that someone had generated the Rome chart in the first place).

Was I on the right track here? Martinvl (talk) 21:06, 22 December 2021 (UTC)Reply

Hi @Martinvl! Unfortunately, I don't think we can help you there. Your idea is good, if you ask me, but it's not in the scope of this project. What you're planning is something more along the lines of a combination of Wikidata + the Global templates initiative, but SDAW would not do any good to your proposal. It would help on the fringes of your idea, maybe. Sorry for the bummer. :( Sannita (WMF) (talk) 10:18, 29 December 2021 (UTC)Reply
@Sannita (WMF): : Thank you for yoru reply. I feared as much, but I felt that it was worth a try. Martinvl (talk) 16:53, 29 December 2021 (UTC)Reply
edit

I recently developed the idea of creating dedicated Wikidata items for Wikipedia articles.

This stems from currently unfixable Wikidata item conflation problems created by Wikipedia articles across languages describing different things and then being sitelinked together on one item.

I described and shared this solution and problem with the Wikidata and Wikipedia communities.

However, as part of the discussion on Wikipedia, I came to the conclusion that using Multi-Content-Revisions like this project plans to do might be better solution than creating items for every one of the 57 million+ individual Wikipedia articles. Just as Commons stores it's own structured data, Wikipedia should probably as well. Wikidata should only act as a repository that stores data about the general universe, and not anything specific to Wikimedia.

How this would work is Wikipedia editors using the proposed Wikipedia article metadata editor would add a main subject (P921) statement to describe the overall subject of the Wikipedia article. Then, the article would sitelink to other language Wikipedia articles by finding articles that also have the same main subject (P921) statement.

Let me know your thoughts! Lectrician1 (talk) 21:51, 23 December 2021 (UTC)Reply

I agree, although it's painful.
We have a big problem with the interwiki link system, and Wikidata is unwilling or unable to fix it. Wikidata is designed on a premise of a unique 1-to-1 mapping of concepts. At best that is simply wrong, and at worst it is effectively culturally imperialistic. The fact is that different languages can and do divide up concepts in different ways. In practice Wikidata generally treats English as the One True Language, and effectively screws over any foreign language that doesn't conform to English. For example:
There's a language, I think Polish(?), with a word for "children's playground ride, with a seat connected to a pivot point so it can move back and forth". In English we have one word for such a device in a vertical orientation - we call it a swing. We have a different word for such a device in a horizontal orientation - we call it a see saw. Both English articles need to link to the same foreign article, and the foreign article needs links to both English articles. (Preferably with some brief clarifying text attached.) Wikidata can't or won't do that.
It's known as the "Bonnie and Clyde" problem, because some Wikis have one article on Bonnie and Clyde as a famous couple, while other wikis have a separate biography for each person. However in my opinion that is an unhelpful and misleading name for the problem. It can lead people to think it's merely some issue with the Wikis, that it could fixed if the wikis simply agreed to work together and do the articles the same way. No no no. Languages do not always divide up concepts the same way. Languages can divide up colors differently - for example considering Blue and Green to merely be different shades of a single color. Colors shade smoothly into each other, and defining the borders of what constitutes a "different" color is a completely arbitrary human definition. Different languages can and do divide up concepts differently. It is abusive and broken to tell some other language that their concept-divisions are not equally legitimate, to tell them that they are somehow "wrong".
I called it painful because it will kinda suck to maintain both the existing Wikidata item links in addition to a new interwiki link system. But this has been a long term problem, and Wikidata can't or won't solve it.
P.S. Lectrician1 suggested "add a main subject (P921) statement [] sitelink to other language Wikipedia articles [with] the same main subject (P921) statement." That doesn't work, that is essentially the same functionality we have now with the same problem we have now. We need greater functionality. We need the English articles on Swing and See Saw to both link to the same Polish(?) article. And there are likely English articles that need to link to two (or more) Polish articles. Alsee (talk) 15:32, 13 February 2022 (UTC)Reply
But it does work... If the English article on See Saw has "main subject: see saw", the English article on Swing has "main subject: swing", and the Polish article has "main subject: see saw, swing" or "main subject: pivoting playground device" (which see saw and swing are subclasses of), they should all link together.
I explain this in more-depth with an example of an article that combines topics on the Japanese Wikipedia here Lectrician1 (talk) 17:11, 13 February 2022 (UTC)Reply
Okay, I made a dedicated page describing exactly what I'm proposing: wikidata:User:Lectrician1/Sitelinks_2.0 Lectrician1 (talk) 21:45, 20 January 2023 (UTC)Reply

Responding to call for Feedback. We need stuff, but not this.

edit

I happen to be a programmer, so I understand better than most what you are trying to achieve, and why. However...

First, there is the issue of the labor to do all this work. Where is that supposed to come from? We are already struggling to keep up with our existing work. Our EN:WP:WikiProject_Articles_for_creation/Backlog_drive_header - new articles sitting unreviewed in draftspace waiting for possible promotion to mainspace article - is currently backlogged by over three months with 3,195 pending submissions. New users are waiting up to THREE MONTHS to find out whether their article will make it into the encyclopedia. EN:WP:NPP currently has a backlog of nearly 7000 unreviewed pages in mainspace. Our EN:WP:Backlog page has literally millions of tasks open. That barely scratches the surface of our work. If you make this new system - if the community were to accept it - it means we would spend less time creating new articles. It means we would spend less time expanding and improving existing articles. It means we would spend less time cleaning up bias and propaganda and misinformation and other crap that gets into articles. It means we would have less time available to assist new users.

Second, look at your mockup images:

I have one question. Look at that and tell me, are you making the wiki simpler and easier and more inviting for new users? Or does this make the wiki vastly more complex, more difficult, more confusing and overwhelming for a new user?

Third, related to the above point, English Wikipedia is currently stalled in an unresolved conflict on whether we want to ban use of Wikidata-in-Wikipedia. Structured data is designed by programmers for programmers. I'm a programmer, I understand why you like structured data, and how it's better suited for computer consumption. We understand the selling points for structured data, but Wikipedia is made by people, for people. The two systems are fundamentally incompatible. The Foundation's push to serve computers, rather than to serve people, is making things more complicated and more difficult. A substantial portion of the EnWiki community is ready to ban this kind of thing as disruptive to our work. Wikipedia is a human and volunteer project that happens to use some technology, not the other way around.

P.S. There is so much that we do want and do need. The Foundation has not been devoting enough time to maintaining and improving our core platform, and the Community Tech team is badly understaffed. There are countless projects there that aren't getting done. Alsee (talk) 08:14, 14 February 2022 (UTC)Reply

Hello @Alsee, and thank you for your thorough feedback!
I also wanted to let you know that the proposal you read on the project page is actually a bit outdated - we've heard similar feedback in the past from other users, and so we are in the process of working on a revision to the plan described that will take all of your feedback into account.
We planned on updating the page once we were done with the new plan, but in order to avoid further misunderstandings, I will put an advice on the relevant section right now to mark it as "outdated". Thanks again for your time, and we look forward for you to review our new plan in the future! Sannita (WMF) (talk) 14:49, 14 February 2022 (UTC)Reply
Return to "Structured Data Across Wikimedia" page.