Wikimedia Technology/Annual Plans/FY2019/CDP3: Knowledge Integrity
A community member has proposed a change to address the definition of "reach", and to changes to potential privacy practices. |
Program Manager: | Jake Orlowitz | |
Executive Sponsor (C-level): | Erika Bjune |
CDP Goals and Outcomes
editAnnual Plan FY18-19 topline goals | |
#2) Knowledge as a Service - increase reach
How does your program affect the annual plan topline goal: Facts matter—but they do not live alone. Facts are part of a chain of verifiability, providing every person with the tools to learn how we know what we know. References to reliable sources are the foundation of Wikipedia’s trustworthiness and its broad adoption as a global source for knowledge. The Wikimedia movement can only fulfil its mission of distributing free knowledge globally and effectively if it acts as a gateway for readers to reach reliable sources that underpin our content. To support this goal, we need an infrastructure built on fact provenance, open citation standards, and interoperability that will empower readers with the best possible tools for critically consuming information. The yearlong plan described below will lay the foundations for this longer arc of work: strengthening our reference infrastructure, expanding our network of knowledge partners, building the public’s awareness of how Wikimedians vet facts and sources, and centering Wikimedia in the pressing conversation about combating misinformation. Together, this program sets a course to establish Wikimedia as the backbone of the trustworthy web while expanding Wikimedia’s reach to a broader knowledge ecosystem. | |
Program Goal | |
Wikimedia sites provide the most trustworthy, comprehensive, neutral information across topics and languages by referencing this information to vetted reliable sources and linking it to external content providers and metadata repositories, making Wikimedia projects the central gateway to access citable information in the knowledge ecosystem. | |
Outcome 1 (Research) | |
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy. | |
Outcome 2 (Infrastructure and tooling) | |
Contributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories. | |
Outcome 3 (Access and preservation) | |
Resources cited across Wikimedia projects are accessible to readers in perpetuity, thanks to technical partnerships securing their preservation and digitization. | |
Outcome 4 (Outreach) | |
More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects. | |
Outcome 5 (Awareness) | |
The public has increased awareness and understanding of the processes Wikimedians follow to verify and fact-check information, and on the benefits of open, auditable, linked information ecosystems. |
CDP Targets
editOutcome 1 (Research) | Target 1 | Measurement method |
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy. | Output 1 & 2
A study on the state of sourcing and verifiability of Wikimedia projects is delivered. |
Output 1 & 2
Availability of the study. |
Outcome 2 (Infrastructure & tooling) | Target 2 | Measurement method |
Contributors, tool developers and partner organizations can understand accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories. | Output 3
An event stream tracking the creation and modification of external links and references across Wikimedia projects is delivered. |
Output 3
Availability of the service. |
Output 4
A study on the impact of algorithmic methods on citation gaps is delivered. |
Output 4
Availability of the study. | |
Output 5 & 11
End-to-end integration of Citoid in Wikidata. |
Output 5 & 11
The integration is completed. | |
Outcome 3 | Target 3 | Measurement method |
Resources cited across Wikimedia projects are accessible to readers in perpetuity, thanks to technical partnerships securing their preservation and digitization. | Output 7
Compared to last year, archive links from all Wikipedia languages. Reduce the time between link creation and link archiving to less than 1 minute. Increase by 50% the volume of links recovered in Wikipedia outside of the English language edition. |
Output 7
Currently, link archiving via InternetArchiveBot is active only on 12 projects. We’ll measure the lag by using the new event stream. Links recovered to date by language are: als, 935 bar, 6,588 ckb, 1,359 en, 2,776,866 es, 129,276 it, 107,646 nl, 86,559 no, 139,735 ru, 173,380 species, 31,766 sv, 201,660 zh, 325,676 |
Outcome 4 | Target 4 | Measurement method |
More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects. | Output 6 & 8
Increase attendance of WikiCite outreach through satellite events by 50% over the current participation baseline. |
Output 6 & 8
Historical attendance statistics. |
Output 9
#1Lib1Ref increases number of edits from librarians by 50%;, number of editors by 25%; 5 more languages participate and are tracked. OAbot raises the number of links added and increases unique participants by 50%; 2 non-English Wikipedias participate. |
Output 9
Using the #Hashtag Tools data dump to analyze #1Lib1Ref participants and contributions vs. last year; doing the same with #OAbot’s internal data. | |
Outcome 5 | Target 5 | Measurement method |
The public has increased awareness and understanding of the processes Wikimedians follow to verify and fact-check information, and of the benefits of open, auditable, linked information ecosystems. | Output 10
10,000 people read or receive blogs, tweets, or presentations about Knowledge Integrity, Wikipedia’s reliability, citation infrastructure, and fact-checking power. 4 high-profile press stories drawing from Knowledge Integrity narratives |
Output 10
WMF Blog, Medium.com, and Twitter.com stats; conference attendance numbers. Press coverage. |
CDP Budget Segment 1
editTeam: Research (Technology) | |
Outcome 1 (Research) | |
Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy. | |
Output 1: A map of verifiability of information in Wikimedia projects | |
We will conduct and publish research to map the “state of verifiability” of free knowledge by conducting analyses of what content in Wikipedia and Wikidata are unsourced or in need of citations, and which existing sources cited across Wikimedia projects are accessible by the general public. | |
Output 2: Research to understand how readers use citations | |
We will conduct research to understand how readers use citations by combining quantitative and qualitative analysis to identify information quality and sourcing gaps, in order to determine to what degree readers’ learning goals are met by consuming Wikimedia content alone rather than requiring references and links to external resources. | |
Outcome 2 (Infrastructure and tooling) | |
Contributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories. | |
Output 3: A public reference event stream | |
We will create a robust, real-time event stream tracking the creation and modification of external links and references across Wikimedia projects. This data stream will provide tool developers, content partners, and other data consumers (libraries and GLAM institutions, metadata organizations, researchers, altmetrics providers) a canonical data source to track and contribute to the sourcing work of Wikimedia volunteers. This is a dependency for the link rot initiative (Segment 2 (Programs) • Output 6) | |
Output 4: Smarter tools and recommender systems to add citations | |
We will improve tools to identify unsourced statements, such as Citation Hunt, that are heavily relied upon by outreach events and campaign organizers, with algorithmically generated recommendations. We will continue to develop and test algorithmic methods to help volunteers identify and fill citations gaps, amplifying the reach and impact of initiatives such as 1Lib1Ref and extending pilot we conducted in FY 2018. | |
Output 5: More usable interfaces to source Wikidata statements | |
We will conduct research on integrating Citoid in Wikidata, aiming to drastically reduce the number of unsourced statements in Wikidata at risk of deletion and to facilitate their reuse across other Wikimedia projects. (dependent on Segment 4 (WMDE) • Output 5) | |
Outcome 4 (Outreach) | |
More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects. | |
Output 6: Funding the WikiCite event series | |
Fundraise for the annual meeting in the WikiCite series and set of satellite events, to improve the sustainability and global reach of the initiative. |
Resources
editBaseline
edit- 1.75 FTE (4 researchers, 1 software engineer – Output 1 & 4)
- 0.25 FTE (1 design researcher, 1 software engineer – Output 2 & 5)
- 0.50 FTE (1 software engineer – Output 3)
Other costs
edit- 100K (WikiCite 2018 restricted grant, year 1 – Output 7)
Dependencies
edit- Analytics, Reading Infrastructure, Services (Output 3)
- Audiences, Wikimedia Deutschland (Output 5)
- Programs (CE) (Output 1, 4 & 5)
- Advancement (Output 7)
CDP Budget Segment 2
editTeam: Programs (Community Engagement) | |
Outcome 3 (Access and preservation) | |
Resources cited across Wikimedia projects are accessible to readers in perpetuity, thanks to technical partnerships securing their preservation and digitization. | |
Output 7: Initiatives to prevent link rot | |
We will deepen a partnership with the Internet Archive to facilitate the immediate and widespread caching of resources linked from Wikimedia projects, and to prioritize external efforts to digitize sources cited in Wikimedia projects. | |
Outcome 4 (Outreach) | |
More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects. | |
Output 8: Hosting the WikiCite event series | |
Host the annual WikiCite gathering and extend promotion efforts over a broader timeframe to include a set of satellite events in order to improve the sustainability and global reach of the event. | |
Output 9: Contribution campaigns | |
1Lib1Ref runs twice a year in January and May (instead of once) and is expanded to include references for statements on Wikidata; the second #OAbot campaign during Open Access Week adds more links to free-to-read versions alongside closed access sources. | |
Outcome 5 (Awareness) | |
The public has increased awareness and understanding of the processes Wikimedians follow to verify and fact-check information, and on the benefits of open, auditable, linked information ecosystems. | |
Output 10: An audience map and communication plan | |
We will establish an audience map of our ecosystem and develop unique communications strategies based on each audience we would like to reach. Strategies may include writing for our blog, writing for other outlets, social media, and creating an events messaging strategy (the story we tell to each audience) which could include a blog series, Twitter tactics, and presentations at key conferences/events. |
Resources
editBaseline
edit- 0.5 FTE (Program manager - supports all outputs, lead on 7, 8, & 9)
- 0.25 FTE (1 Library Specialist – Output 7 & 9)
- 0.1 FTE (Lead Programs Manager - Output 6, 7, 8, 9)
- 100 hrs (Contractor - Output 9)
Growth
edit- 0.25 FTE (1 Product and Metrics Analyst – Output 6 & 8)
Dependencies
edit- Analytics, Reading Infrastructure, Services (Output 3 & 4)
- Audiences, Wikimedia Deutschland (Output 5)
- Communications (Output 9 & 10)
- Research, Advancement (Output 6)
CDP Budget Segment 3
editTeam: Wikimedia Deutschland | |
Outcome 2 (Infrastructure and tooling) | |
Contributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories. | |
Output 11: More usable interfaces to source Wikidata statements | |
We will work on the integration of Citoid in Wikidata, aiming to drastically reduce the number of unsourced statements in Wikidata at risk of deletion and to facilitate their reuse across other Wikimedia projects. |
Resources
edit(Output 5)
Baseline
editExperimental Citoid integration in Wikidata:
- 0.25 FTE (contracts) on WMF
- 0.25 FTE on WMDE
Growth
editFull Citoid integration in Wikidata (assuming additional funding):
- 0.25 FTE (contracts) on WMF for maintenance and consulting
- 2 FTE on WMDE (development, design, PM)
Dependencies
edit- Design Research support from Research (see Segment 1 (Research) • Output 5)