Strukturierte Daten in Wikimedia/Abschnittsthemen

This page is a translated version of the page Structured Data Across Wikimedia/Section Topics and the translation is 63% complete.

This page describes the work underway to design and build features to identify section topics in a Wikipedia article, currently in development by the Structured Data Across Wikimedia team.

Hintergrund

Für more information about the current tool architecture siehe: Structured Data Across Wikimedia/Section Topics/Data Pipeline.

The Section Topics project will identify sections in an article and create topics accordingly for those sections, drawing on several elements, such as:

  • an algorithm that detects Wikidata items based on the section’s blue links (which will be developed in partnership with the Structured Data, Research, and Data Platform teams);
  • the ability to automatically identify sections in an article (which will be developed in partnership with the Structured Data and Data Platform teams).

One of the first use cases we envisioned for section topics will be section-level image suggestions, which will use the blue-links algorithm and section identification infrastructure above, and be delivered both via the newcomer experience and via notifications for experienced contributors. This will build upon the work done on image suggestions and will be developed in partnership with the Structured Data, Data Platform, Research, Search, Android, and Growth teams.

These elements will not change, nor impact the current editing experience for users. All these activities will be automatic and will not depend on any action from editors. Currently, this project is in its development phase, and there are still aspects that may require further investigation and/or feedback from users.

Beispiele von möglichen Themenabschnitte

The following are some examples of section topics extracted from Wikipedia articles, during a test run on English and Russian Wikipedias. Currently, we are working on a way to determine the most relevant topics for any given section, through a custom TF-IDF weight function.

Beispiel 1 (Englischsprachige Wikipedia)
Artikel: Campbell Island, New Zealand
Abschnitt: History
Beispiel von Themenabschnitte:World War II”, “Pinniped”, “Brig”, “Great Depression”, “Perseverance Harbour
Beispiel 2 (Englischsprachige Wikipedia)
Artikel: Dorothy E. Smith
Abschnitt: Biography
Beispiel von Themenabschnitte:Toronto”, “University of British Columbia”, “London School of Economics”, “Vancouver”, “University of California, Berkeley
Beispiel 3 (Englischsprachige Wikipedia)
Artikel: Battle of Surabaya
Abschnitt: Background
Beispiel von Themenabschnitte:Sukarno”, “Mohammad Hatta”, “Jakarta”, “Proclamation of Indonesian Independence”, “East Java
Beispiel 4 (Englischsprachige Wikipedia)
Artikel: Tour of Greece
Abschnitt: Past winners (note: the whole section is a table)
Beispiel von Themenabschnitte: Names of tour’s winners, ordered by relevance score:Ioannis Tamouridis”, “Valeriy Dmitriyev”, “Henri Manders”, “Thomas Liese”, “Assan Bazayev”, etc.
Beispiel 5 (Russischsprachige Wikipedia)
Artikel: Адлон (отель)
Abschnitt: История
Beispiel von Themenabschnitte:Дитрих, Марлен”, “Вторая мировая война”, “Чаплин, Чарльз”, “Вильгельм II (император Германии)”, “Шинкель, Карл Фридрих”, “Первая мировая война
Beispiel 6 (Russischsprachige Wikipedia)
Artikel: Военная стратегия
Abschnitt: История
Beispiel von Themenabschnitte:Сунь-цзы”, “Наполеон I”, “Первая мировая война”, “Искусство войны”, “Блицкриг”, “Александр Македонский”, “Вторая мировая война

Further planned development

Based on the viability of those options, the project also aims at using section topics to improve our SEO[1] reach with outside search engines, as a follow up to the experiment conducted in task T302735.

References

  1. SEO — Search Engine Optimization