Wikimedia Research/Showcase/Archive/2018/02

February 2018

edit

21 February 2018 Video: YouTube

Visual Enrichment of Collaborative Knowledge Bases
 
slides
By Miriam Redi, Wikimedia Foundation
Images allow us to explain, enrich and complement knowledge without language barriers.[1] They can help illustrate the content of an item in a language-agnostic way to external data consumers. Images can be extremely helpful in multilingual collaborative knowledge bases such as Wikidata.
However, a large proportion of Wikidata items lack images. More than 3.6M Wikidata items are about humans (Q5), but only 17% of them have an image associated with them. Only 2.2M of 40 Million Wikidata items have an image. A wider presence of images in such a rich, cross-lingual repository could enable a more complete representation of human knowledge.
In this talk, we will discuss challenges and opportunities faced when using machine learning and computer vision tools for the visual enrichment of collaborative knowledge bases. We will share research to help Wikidata contributors make Wikidata more “visual” by recommending high-quality Commons images to Wikidata items. We will show the first results on free-licence image quality scoring and recommendation and discuss future work in this direction.


Backlogs—backlogs everywhere
Using machine classification to clean up the new page backlog
 
slides
By Aaron Halfaker, Wikimedia Foundation
If there's one insight that I've had about the functioning of Wikipedia and other wiki-based online communities, it's that eventually self-directed work breaks down and some form of organization becomes important for task routing. In Wikipedia specifically, the notion of "backlogs" has become dominant. There's backlogs of articles to create, articles to clean up, articles to assess, new editor contributions to review, manual of style rules to apply, etc. To a community of people working on a backlog, the state of that backlog has deep effects on their emotional well being. A backlog that only grows is frustrating and exhausting.
Backlogs aren't inevitable though and there are many shapes that backlogs can take. In my presentation, I'll tell a story about where English Wikipedia editors defined a process and set of roles that formed a backlog around new page creations. I'll make the argument that this formalization of quality control practices has created a choke point and that alternatives exist. Finally I'll present a vision for such an alternative using models that we have developed for ORES, the open machine prediction service my team maintains.
  1. Van Hook, Steven R.. "Modes and models for transcending cultural differences in international classrooms". Journal of Research in International Education 10.1 (2011): 5-27.