Wikimedia Product/Perspectives/Augmentation

Augmentation

Overview

The Wikimedia movement wants the sum of all knowledge to be available to everyone in the world. We also want the process to assemble that knowledge to be inclusive, balanced, and safe for all participants. But there is too much knowledge needed, in too many languages, for humans to do this alone. As an example, if we assume that a Wikipedia that covers a substantial amount of knowledge has 2 million articles (likely a low estimate), and we believe that 300 languages should have access to that knowledge, we should expect there to be 600 million articles. There are currently only 48 million articles,^[1] which is 8% of the way there. There are simply not enough potential human editors, especially in smaller languages, to get there. Whether or not we believe that long-form articles will be the medium of the future, this illustrates the problem we face.

Augmentation for contribution activities is our path to closing these gaps. Augmentation refers to any technology that helps humans do their work, and wikis have been using augmentation almost since their beginnings: Rambot created 34,000 articles from Census data in 2002,^[2] Twinkle has been automating repetitive tasks since 2007,^[3] ClueBot has been reverting vandalism since 2011,^[4] and the Content Translation tool has employed machine translation to help generate content since 2016. Over the next three to five years, human editors will need to increasingly wield augmentation tools, especially those that incorporate artificial intelligence, to create content, curate content, and maintain a safe environment on the wikis. Artificial intelligence will not replace human editors -- it will allow human editors to focus on the most impactful and fulfilling work, and, if used correctly, will open up more avenues for more contributors.^[5]

But although artificial intelligence is a powerful editing aid, it also has the potential to powerfully magnify the problems of bias and unfairness^[6]^[7] that already exist in the wikis, and has the potential to discourage new editors.^[8] Therefore, the role of human editors will change in the future to focus on wielding these tools safely to guard the wiki values that only humans understand.^[5] In pursuing any augmentation technology, we should stick to the principles we apply to code and content: transparency and the ability for anyone to contribute. We should build closed-loop systems that essentially make augmentation “editable” by community-members, even non-technical ones. By making it possible for members of all communities to audit augmentation tools, contribute training data, flag errors, and tailor tools to their wikis, we will ensure that wikis are not unduly influenced by the smaller set of people who build the tools, while also opening up a new avenue of contribution.

In terms of capabilities we need to build, the Wikimedia Foundation should do two main things:

Build an infrastructure platform for many people to contribute augmentation tools.
Provide interfaces that make it possible for non-technical editors to apply, adjust and contribute to those tools.

The former would likely be pursued by the Technology department, while the latter would be pursued by the Audiences department. The Audiences work will create on-wiki tools that allow non-technical editors to record training data, identify errors in existing algorithms, and tune algorithms to fit their wiki’s culture; surfacing those tasks as first-class wiki work that other editors can see. Through these interfaces, the shepherding of augmentation tools will become a new, major way of contributing that will ensure that machines are fair and healthy contributors to every wiki.

Assembling the platform and the interfaces that allow a feedback loop are the most important parts of this strategy -- more important than the particular applications of augmentation. That said, particular augmentation tools will generally fall into three aspects: content generation, content curation, and community conduct. We will need to develop design principles in each of these aspects that ensure augmentation tools are transparent and editable; and that ensure augmentation respects the boundaries between human work and machine work. These principles should also govern the ways we incorporate augmentation resources from third parties not controlled by the Wikimedia movement, such as machine translation services.

And finally, in order to be successful with this strategy, we will need to continuously recognize and embrace augmentation as a major way to contribute to the wikis. We can do this through community capacity building, holding events, providing training, and encouraging discussion in the community.

Aspects

Examples

Rambot (content generation)
Twinkle (content curation)
ClueBot (content curation)
SuggestBot (content generation)
HostBot (governance)
Bot approval process (governance)
ORES models in RecentChanges and Watchlist (content curation)
Content Translation tool (content generation)
Article Placeholder (content generation)

Areas of Impact

Wikidata^[9]
ORES^[10]
Experienced editors^[11]
Volunteer developers^[12]

Key External Factors

The rate of improvement to artificial intelligence, especially machine translation.^[13]
Efforts by other tech companies to automatically translate English Wikipedia, or to otherwise make massive amounts of information available.^[14]
The movement’s ability to get top talent to work on these issues as staff or volunteers.^[15]

White Paper

Resources

Bohannon, John and Dharnidharka, Vedant (2018). Quicksilver: Training an ML system to generate draft Wikipedia articles and Wikidata entries simultaneously. [Video from Wikimedia Research Showcase August 2018]. Retrieved from https://youtu.be/OGPMS4YGDMk.
Chisholm, A., Radford, W., & Hachey, B. (2017). Learning to generate one-sentence biographies from Wikidata. EACL.
Halfaker, Aaron. 2017. Interpolating Quality Dynamics in Wikipedia and Demonstrating the Keilana Effect. In Proceedings of the 13th International Symposium on Open Collaboration (OpenSym '17). ACM, New York, NY, USA, Article 19, 9 pages. DOI: https://doi.org/10.1145/3125433.3125475
Halfaker, Aaron, et. al. ORES: Facilitating re-mediation of Wikipedia’s socio-technical problems. From Wikimedia Commons.
Kaffee LA. et al. (2018) Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders. In: Gangemi A. et al. (eds) The Semantic Web. ESWC 2018. Lecture Notes in Computer Science, vol 10843.

References

↑ Wiki segmentation 2018
↑ Lih, Andrew (2009). The Wikipedia Revolution: How a Bunch of Nobodies Created the World's Greatest Encyclopedia. p. 102. New York: Hyperion. ISBN 978-1-4013-0371-6.
↑ History of Twinkle
↑ History of ClueBot
↑ ^5.0 ^5.1 This book The Second Machine Age examines the likely economic implications of artificial intelligence by looking at the effects of the Industrial Revolution. It makes the case that in the near and medium terms, artificial intelligence can create more jobs for humans than it replaces. Rather than replacing humans with machines, smart businesses will overhaul the way they work to incorporate new technologies as tools wielded by humans with increasingly sophisticated skill sets. This is already the case with companies who have successfully adopted modern IT practices. Brynjolfsson, Erik and McAfee, Andrew (2016). The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. New York: W. W. Norton and Company. ISBN 978-0-393-35064-7.
↑ JADE/Intro blog/Short story by Aaron Halfaker Gives a short description of how models can reinforce bias in the Wikimedia setting. Academic sources also exist.
↑ Buowamlini, Joy (2018). The Dangers of Supremely White Data and The Coded Gaze [Video from Wikimania 2018]. Retrieved from https://www.youtube.com/watch?v=ZSJXKoD6mA8
↑ Halfaker, A., Gieger, R. S., Morgan, J., & Riedl, J. (2013). The Rise and Decline of an Open Collaboration System: How Wikipedia's reaction to sudden popularity is causing its decline. American Behavioral Scientist 57(5) 664-688. Identifies automated curation systems as a key factor in de-personalizing the wikis and driving away new contributors.
↑ Wikidata has the potential to be the abstract database of facts from which artificial intelligence could create content. We should decide whether we want this to be the case, and if so, to put resources behind Wikidata.
↑ ORES and the way it is architected is the proof-of-concept for an open and auditable artificial intelligence abstraction in the wikis. It could either continue to grow to encompass more tasks, or it could serve as a model for future systems.
↑ Experienced editors will need to continuously adjust their perception of what it means to do wiki work, as technology gives them increasingly powerful tools for content generation, content curation, and governance.
↑ Volunteer developers will have a new way to contribute to the wikis beyond just software and content. They will be able to contribute algorithms.
↑ There are many varying estimates for how quickly artificial intelligence will be able to take on human tasks. It is possible that capabilities will increase so quickly that the wikis are operating fundamentally differently within five years. Or that may not happen for 30 years. We should err on the side of expecting changes sooner, otherwise the wikis may be eclipsed by other, less open and fair, projects.
↑ As machine translation improves, major tech companies and startups may attempt to make information, such as English Wikipedia, automatically available across all languages. The risk is that those companies would not have the same inclinations toward openness and fairness as the Wikimedia movement. If companies become suppliers of information before Wikimedia projects do, the world may wind up with an inferior dominant source of information.
↑ If we see artificial intelligence as a critical path toward our movement’s goals, we should be mindful of the difficulty of getting top talent to work on it. People who work on artificial intelligence are in high demand at the most elite high-paying companies in the world, but we will need them as volunteers and staff for Wikimedia projects.

[1] Wiki segmentation 2018

[2] Lih, Andrew (2009). The Wikipedia Revolution: How a Bunch of Nobodies Created the World's Greatest Encyclopedia. p. 102. New York: Hyperion. ISBN 978-1-4013-0371-6.

[3] History of Twinkle

[4] History of ClueBot

[:0-5] 5.0 ^5.1 This book The Second Machine Age examines the likely economic implications of artificial intelligence by looking at the effects of the Industrial Revolution. It makes the case that in the near and medium terms, artificial intelligence can create more jobs for humans than it replaces. Rather than replacing humans with machines, smart businesses will overhaul the way they work to incorporate new technologies as tools wielded by humans with increasingly sophisticated skill sets. This is already the case with companies who have successfully adopted modern IT practices. Brynjolfsson, Erik and McAfee, Andrew (2016). The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. New York: W. W. Norton and Company. ISBN 978-0-393-35064-7.

[6] JADE/Intro blog/Short story by Aaron Halfaker Gives a short description of how models can reinforce bias in the Wikimedia setting. Academic sources also exist.

[7] Buowamlini, Joy (2018). The Dangers of Supremely White Data and The Coded Gaze [Video from Wikimania 2018]. Retrieved from https://www.youtube.com/watch?v=ZSJXKoD6mA8

[8] Halfaker, A., Gieger, R. S., Morgan, J., & Riedl, J. (2013). The Rise and Decline of an Open Collaboration System: How Wikipedia's reaction to sudden popularity is causing its decline. American Behavioral Scientist 57(5) 664-688. Identifies automated curation systems as a key factor in de-personalizing the wikis and driving away new contributors.

[9] Wikidata has the potential to be the abstract database of facts from which artificial intelligence could create content. We should decide whether we want this to be the case, and if so, to put resources behind Wikidata.

[10] ORES and the way it is architected is the proof-of-concept for an open and auditable artificial intelligence abstraction in the wikis. It could either continue to grow to encompass more tasks, or it could serve as a model for future systems.

[11] Experienced editors will need to continuously adjust their perception of what it means to do wiki work, as technology gives them increasingly powerful tools for content generation, content curation, and governance.

[12] Volunteer developers will have a new way to contribute to the wikis beyond just software and content. They will be able to contribute algorithms.

[13] There are many varying estimates for how quickly artificial intelligence will be able to take on human tasks. It is possible that capabilities will increase so quickly that the wikis are operating fundamentally differently within five years. Or that may not happen for 30 years. We should err on the side of expecting changes sooner, otherwise the wikis may be eclipsed by other, less open and fair, projects.

[14] As machine translation improves, major tech companies and startups may attempt to make information, such as English Wikipedia, automatically available across all languages. The risk is that those companies would not have the same inclinations toward openness and fairness as the Wikimedia movement. If companies become suppliers of information before Wikimedia projects do, the world may wind up with an inferior dominant source of information.

[15] If we see artificial intelligence as a critical path toward our movement’s goals, we should be mindful of the difficulty of getting top talent to work on it. People who work on artificial intelligence are in high demand at the most elite high-paying companies in the world, but we will need them as volunteers and staff for Wikimedia projects.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]