Talk:Structured Data Across Wikimedia

About this board

Martinvl (talkcontribs)

When I first saw the MediaWIki announcement, my thoughts went over to weather charts. I foresaw a two-level product. The top level woudl be the general code for a weather chart and the lower level would be charts for specific localities. Thus, if I am using a small Wikipedia, I would ensure that the upper termplate was compatible with my language and I could then call up the weather chart with a simple call such as {{weather|la|Rome}} would include the weather chart for Rome in latin (provided of course that someone had generated the Rome chart in the first place).

Was I on the right track here?

Sannita (WMF) (talkcontribs)

Hi @Martinvl! Unfortunately, I don't think we can help you there. Your idea is good, if you ask me, but it's not in the scope of this project. What you're planning is something more along the lines of a combination of Wikidata + the Global templates initiative, but SDAW would not do any good to your proposal. It would help on the fringes of your idea, maybe. Sorry for the bummer. :(

Martinvl (talkcontribs)

@Sannita (WMF): : Thank you for yoru reply. I feared as much, but I felt that it was worth a try. Martinvl (talk) 16:53, 29 December 2021 (UTC)

Reply to "Weather charts"

Use Multi-Content-Revisions to replace the sitelink system on Wikidata

1
Lectrician1 (talkcontribs)

I recently developed the idea of creating dedicated Wikidata items for Wikipedia articles.

This stems from currently unfixable Wikidata item conflation problems created by Wikipedia articles across languages describing different things and then being sitelinked together on one item.

I described and shared this solution and problem with the Wikidata and Wikipedia communities.

However, as part of the discussion on Wikipedia, I came to the conclusion that using Multi-Content-Revisions like this project plans to do might be better solution than creating items for every one of the 57 million+ individual Wikipedia articles. Just as Commons stores it's own structured data, Wikipedia should probably as well. Wikidata should only act as a repository that stores data about the general universe, and not anything specific to Wikimedia.

How this would work is Wikipedia editors using the proposed Wikipedia article metadata editor would add a main subject (P921) statement to describe the overall subject of the Wikipedia article. Then, the article would sitelink to other language Wikipedia articles by finding articles that also have the same main subject (P921) statement.

Let me know your thoughts!

Reply to "Use Multi-Content-Revisions to replace the sitelink system on Wikidata"

Your expectations about the project

12
Sannita (WMF) (talkcontribs)

This thread will address two themes about the general expectations about the project and the way it will impact on users' workflows:

  1. What do users expect from this project? What are the necessary actions to be addressed?
  2. How do you envision this metadata being used? Can you think of ways it would aid in your workflows?
Oursana (talkcontribs)

See above, the project doesn't trigger me. I can imagine support with wikilinks. On wd one gets suggestions for properties, qualifiers, values etc which is very helpful. So than I would like on wp a suggestion for e.g. birthdate and reference...From the example I do not get where you want to go. Perhaps I would like more examples. We need more data on wd and editing there could even be easier

Lucas Werkmeister (talkcontribs)

I don’t know what I expect from this project. The description feels pretty vague to me – I don’t really have an idea what kind of metadata or structured data could be collected. The first action area, tagging sections, makes some sense, but I don’t think anything is stopping interested Wikipedias from implementing something fairly similar today, using RDFa:

== Oldest living organisms ==
<p about="wd:Q590039">
Bristlecone pines are known for attaining great ages. The oldest bristlecone pine in the White Mountains is <span property="schema:name">[[Methuselah]]</span>, which has a verified age of 4,852 years. It is located in the <span property="schema:location">[[Inyo National Forest]]</span> in Eastern California. The specific location of Methuselah is a very closely guarded secret.
</p>

(Based on Bristlecone pine, Wikipedia contributors, CC BY-SA 3.0.)

CParle (WMF) (talkcontribs)

AIUI initially the metadata will simply be a list of topics associated with a section, the idea being that it's a step on the road to being able to search for and serve smaller pieces of content than a whole article

Interested wikipedias can do what you describe, indeed, but it's pretty awkward

Framawiki (talkcontribs)

Hi, I'm interested about the result of the work will be stored. Would it be in a separate database on a project' server, or directly somewhere in wikis?

Perhaps it may be interesting to serve the results linked in wikidata objects. Looking at the mockup image, we can imagine properties (bold items with icons) and values (q items). Thanks!

CParle (WMF) (talkcontribs)

We're planning to store the topical metadata in a "slot" on the page - similar to how we store structured data for images on commons. Ultimately it'll also be injected into elasticsearch, but we haven't figured out the details of that yet

Billinghurst (talkcontribs)

Obviously I am wanting Wikisource reproductions to be better findable, both internally and externally. Better and more easily usable.


I am hoping that the project will be able to tell me what is possible for my wishes for an integrated ability to interact and inject pertinent detail crosswki. in the easiest, most reliable, and referenced form.


...


Ultimately I am also hoping for better input/workflow tools to make things easier and better, and to stop having to add the same data at multiple places.

  • Creating a commons file
  • Creating the wikisource components, transcluding, then creating the WD item for the edition, then needing to create the work item for the parent
  • Then having to update the Commons file to link back to the WD item
  • then edit the WD item to add the IA details
  • Then I head over to Wikipedia to then find the right template to re-add the data parameter by parameter, when I should either be able to just add a contextual template or use a lookup form or a template add the WD item, and just be able to have the data inhaled and added.

(all the around and around and around)


At the moment I use the WEF framework tool though while it is pretty good, it doesn't completely suit needs at creating/editing items from enWS. I would love to be able to have a customisable tool to create/edit WD from the sister wikis rather than having to go to WD and manually edit the item. Similarly FROM something like s:en:A catalogue of notable Middle Templars, with brief biographical notices/Bayley, Sir Edward Clive I want to be able to directly apply to the related main subject's item (not its own item) the data able to be pushed in and referenced.


No idea exactly how much of that is in scope, but that is what I hope we can think about, or poke somewhere as a way to interlink and overlay data between the sisters.  :-)

Hjfocs (talkcontribs)

Wow, my first thoughts are:

  1. the Wikidata vision is coming true! Structured knowledge to serve as the backbone of the Wikimedia landscape;
  2. sounds like a more fine-grained https://schema.org/ for the Wikimedia ecosystem.

Besides automatic enrichment of Wikipedia infoboxes, I fully support the categorization of page sections: one clear expectation or outcome is to see rich snippets of Wikipedia articles when I run a search, ideally regardless of the engine I use.

I'm not sure how the project would cater for the Abstract Wikipedia, though, so I'd like to hear more on this.

Hjfocs (talkcontribs)

I see that @Lucas Werkmeister actually explains how the page sections effort could also be implemented following the schema.org fashion with RDFa markup.

Belteshassar (talkcontribs)

Unlike most files on Commons or the concepts described by Wikidata items, Wikipedia articles evolve with time and, depending on the implementation, metadata could fall out of sync with the sections. I’m sure there are ways to address this, for example by automatically identifying heavily edited sections.

Eposthumus (talkcontribs)

How do we query the Structured Data on Wikimedia Commons?

Are there plans to provide something similar to the WDQS on Wikidata, or does the structured data eventualy also filter through to Wikidata? (if my understanding that this is actually separate datastores is correct)


BTW, I find the idea of structured data on Commons very exciting, and plan on adding more support and integration for the ICONCLASS classification system. Similar to what is already done for Wikidata.

Sannita (WMF) (talkcontribs)

Hello @Eposthumus, there is a plan for a separate query service for Structured Data on Commons. The plan will be announced soon™, as we the team is still reviewing some aspects. I'll be sure to ping you personally when the announcement will go live.

Personally speaking, I share your excitement and your expectations, but it seems we'll need to wait a bit longer. :)

Reply to "Your expectations about the project"
Sannita (WMF) (talkcontribs)

This thread will address the theme of metadata moderation:

  1. In your opinion, is moderation necessary to avoid vandalism and/or bias?
  2. If moderation is necessary, how can it be effectively managed?
Nintendofan885 (talkcontribs)

metadata moderation would probably be needed (at least in it's early stages) given that many Wikipedia editors don't know how to edit Wikidata which leads to vandalism being left behind so a higher profile structured data would likely have the same problem.

Reply to "Metadata moderation"
Andy Dingley (talkcontribs)

Wikidata has "issues" (or has failed, depending on your viewpoint) in part because it has no schema. This makes its data effectively unprocessable by automated tools, at least those with any embedded intelligence.


Will this project take the same route?

CBogen (WMF) (talkcontribs)

As mentioned above, this project will use Wikidata items. Therefore any issues or improvements to the Wikidata schema would be reflected here.

Reply to "The need for schema"
Andy Dingley (talkcontribs)

Don't we have this already? What has happened to Wikidata?


What is Wikidata failing to do? Why is this needed?


What will SDAW provide to solve that?

CBogen (WMF) (talkcontribs)

The SDAW section topics will use Wikidata items to describe sections of text in Wikipedia articles. There is no intent to replace or supplement Wikidata. Whereas we already have Wikidata items attached to articles at the highest level, this will attach them to sections of text.

Reply to "Why a new project?"
Hogü-456 (talkcontribs)

Hello,

I think it is important that you discuss structured data for Wikimedia further in a language version before you introduce it in a language version. I dont know how high the openess for that topic is in for example the German Wikipedia. Can you please tell me what the distinction of this concept to a system using categories. From my point of view this is an important question after there is only acceptance if there is an understanding of the advantages. I am interested in metadata and I think it is good if this is possible to add it in Wikipages, what is exists currenctly with different Templates and I prefer a concept where it is possible to write the Metadata in Wikitext and I wish that there is also a tool that helps creating metadata for pages and what uses a form. So I think here about a kind of contenct creator. The Wikidataqueryservice is an example of a interface where it is possible to type queries and to search for a property in the text if the functionality is known without using a form for search. This is something what I like.

Sannita (WMF) (talkcontribs)

Hi @Hogü-456:, thank you very much for your comment! I'm sorry to reply to you this late, but we've been busy planning an initiative for reaching out to communities in order to get feedback. In a way, this answers your first request: actually we want the communities to be not just informed, but to take an active part in our project, to help us shaping it better.

About the metadata, as @CParle (WMF): said in another thread at the moment «We're planning to store the topical metadata in a "slot" on the page - similar to how we store structured data for images on Commons».

This system is not going to replace the category system: like Structured Data on Commons, this is going to add another layer to wikipages, in order to make them more easily findable and to help user associate contents between Wikimedia projects (i.e. unused images that can illustrate articles).

If you have more questions or suggestions, please let us know!

Hogü-456 (talkcontribs)

Hello @Sannita (WMF) what I thought with informing the communities was that you enter a information about what you plan at pages of the language versions like German Wikipedia. Not all people read pages on Metawiki or Mediawiki and so you reach more people if you go to the language versions. I havent read something about this project in the central german talk page called Kurier. Please think about writing texts in the different languages of the language versions and publish them directly in the language versions.

Sannita (WMF) (talkcontribs)

Hi @Hogü-456:, of course we will target the communities in their pages, we still didn't start yet with our outreach activities. It's a matter of days though.

About translations, we're still working on it, but we'll try our best to reach out as much as possible in the local languages. Also, feedback from users can be also expressed in native language, if you feel more comfortable. And of course, any help in translation and reaching out is welcome. :)

Reply to "Inform the communities"
ChristianKl (talkcontribs)

The mockup shows an editing interface but it doesn't say anything about how the structure of the underlying metadata. In previous discussions about the Visual Editor, Flow and Structured discussion there was the sentiment that it's important that everything is editable as Wikitext. That leaves the question:

  1. Is it important that Structured data on Wikimedia is editable as Wikitext?
  2. If so how would it be best structured?
Sannita (WMF) (talkcontribs)

Hi Christian, thank you for bringing this up! Sorry for the delay in answering, but I wanted to be sure of the technical details of my answer (which won't be short, so apologies for this too).

So, we did consider adding the section metadata as editable wikitext, but ultimately we think there are a lot of potential difficulties with that, and that on the contrary we would have many advantages in keeping it separate.

Conceptually, separating metadata about a document or text from the document itself makes sense. It also allows machines to read and understand it separately. This is the way we did it with Structured Data on Commons, via Extension:WikibaseMediaInfo. It has several advantages, such as:

  • the ability to refine user permissions for accessing and editing this type of data, and
  • reducing cognitive burden on editors who are not interested in working with this data (as we said in the main page, we don't want to "overwhelm users with too much new content to moderate").

We would also like to try not to complicate wikitext further, by adding additional markup from a technical perspective, because this would create more maintenance burden on a critical piece of the MediaWiki environment (again, we don't want to "introduce too much complexity into our systems").

Also, in the end, information in wikitext is not fully structured data that can be consistently read and understood by machines. We want to ensure that the metadata created by this project will be accessible in a robust, consistent, structured and linked format.

Anyway, I would like to hear more about your reasons why you think it is important that everything be editable as wikitext. Perhaps we can think of other ways to solve those problems.

Lucas Werkmeister (talkcontribs)

I’m not convinced by this yet. If the metadata isn’t tied in with the wikitext, how will the connection between the two be established? It seems to me that it would break each time a heading is renamed in the wikitext.

CParle (WMF) (talkcontribs)

Indeed it would. As far as I know the Platform team is looking into creating stable identifiers for sections, which will allow us to properly associate metadata with a section ... it is possible, however, that the initial PoC will just use the section title text

Lokal Profil (talkcontribs)
Sannita (WMF) (talkcontribs)

Hi Lokal, I'm not sure it would introduce that service as it is described, but I think that would probably turn into a use case for our project, so thanks for bringing this up! Do you have any additional insight or suggestion about this that you want to share with us?

Lokal Profil (talkcontribs)

I think most of it is in the notes for that meeting. Not sure whether @Cscott has looked at this further since then. @Sebastian Berlin (WMSE) might be able to point you to some of the early notes from m:Wikispeech about which changes we determined should invalidate our planned annotations.

This post was hidden by Sannita (WMF) (history)
Cscott (talkcontribs)

We're working on annotations (of a sort) in the process of implementing support for the Translate extension in Parsoid, so we should probably touch base about unifying the various efforts.

Reply to "Wikitext"

New proposal for topic metadata

1
Sannita (WMF) (talkcontribs)

Coming from your feedback in this page, we are re-evaluating the possibilities for human intervention in managing section topic metadata.

Our first proposal relied on humans to approve or reject a given set of machine-generated section metadata, but this can easily become too much of a burden for users (especially in small communities).

So our current proposal is to instead insert the machine-generated topical metadata in Wikipedia article's sections, relying on blue interwiki links and concept relationships, without requiring users to accept or reject that metadata - but leaving the possibility to users to revise it (by adding new concepts or deleting machine-generated concepts). This would result in a much faster way of populating section metadata and in a lesser burden for users.

Here you can visualise the differences between the two proposals. What is your take about this? Do you consider it feasible?

Reply to "New proposal for topic metadata"

Adding and confirming metadata

3
Sannita (WMF) (talkcontribs)

This thread will address the theme of the basic functions related to topical metadata:

  1. Do users want to be able to approve or reject metadata suggested by the automated system?
  2. Do users want to be able to add additional metadata beyond what is suggested by the automated system?
  3. Do you think it may just be sufficient for users to have the opportunity to send feedback with suggestions on how to improve the machine generated metadata, when necessary?
Oursana (talkcontribs)

1)2) strong yes

3)noooo

Hjfocs (talkcontribs)

Dropping my two cents here. From some lessons learnt on data curation tools like Mix'n'match (Q28054658) and Primary Sources Tool (Q20656106), my general (and very personal) feeling is: we are talking about a tedious task, so the less you ask people to approve or reject suggestions, the better.

As mentioned in Structured_data_across_Wikimedia#What_is_changing, I totally agree that we should pick blue Wikipedia links as the low-hanging fruits; I'd also be confident enough to consider corresponding Wikidata items as the ground truth.

In conclusion: why bother about metadata curation at all? Let's take those links for granted! They are already the outcome of human love. Instead, I'd focus on automatically coming up with a summarized representation, read one high-level Wikidata item that best categorizes a given Wikipedia section, read topic categorization.

Reply to "Adding and confirming metadata"
Return to "Structured Data Across Wikimedia" page.