Wikimedia Technology/Annual Plans/FY2019/CDP3: Knowledge Integrity/Goals

Program Goals and Status for FY18/19

  • Goal Owner: Leila Zia
  • Program Goals for FY18/19: Wikimedia sites provide the most trustworthy, comprehensive, neutral information across topics and languages by referencing this information to vetted reliable sources and linking it to external content providers and metadata repositories, making Wikimedia projects the central gateway to access citable information in the knowledge ecosystem.
  • Annual Plan: Segment 1 - Research

Outcome 1 / Output 1


Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

A map of verifiability of information in Wikimedia projects


  • Design and test and end-to-end machine learning framework to identify statements in need of a citation.   Done
  • Improving the taxonomy of reasons why editors add citations to Wikipedia statements   Done
  • Design the experiment and collect larger-scale data about reasons why people add citations   Done



  Note: July 2018

  In progress

  Note: August 21, 2018

  In progress

  Note: September 13, 2018

  In progress Details: we expect this goal to be fully done before the end of Q1. The first bullet point is expected to be done by the end of the month. The third bullet point is done and we have done extensive extra work on it as well. What is left from it is documentation which we expect to be done by 2018-09-18.
Update on Sept 18: all goals for this outcome is   Done

Outcome 1 / Output 2


Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

Research to understand how readers use citations


  • Prepare the data and do preliminary analysis on the first data collection on citation usage based on data gathered via Citation Usage schema   Done
  • Develop a survey to better understand the role of citations in Wikipedia readers evaluations of Wikipedia articles and to identify opportunities for supporting their learning goals and increasing their digital literacy.   Done



  Note: July 2018

  In progress

  Note: August 21, 2018

Data collection is done and the documentation just needs to be finished up   Done. Developing the survey is   In progress and more information is in T199188

  Note: September 18, 2018

The survey wording and goals are now   Done

Outcome 4 / Output 6


More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.

Funding the WikiCite event series


  • Fundraise for the annual meeting in the WikiCite series and set of satellite events, to improve the sustainability and global reach of the initiative.   Done
  • Organize the event, open the application process and design the program   Done



  Note: July 2018

  Done Fundraising is completed!

  Note: August 22, 2018

  In progress Organizing the event is underway

  Note: September 18, 2018

  Done The selection process has completed, notifications to applicants are being sent out as of October 1. The chairs of individual days of the event are now collecting information from selected attendees to finalize the agenda.

Outcome 1 / Output 1


Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

A map of verifiability of information in Wikimedia projects


  • Design a machine learning framework to identify why statements need a citation in English Wikipedia.   Done
  • [Stretch] Submit a paper summarizing the modeling work for unsourced statement detection   Done



  Note: November 13, 2018

This goal was finished up this week, yay!

Outcome 1 / Output 2


Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

Research to understand how readers use citations


  • Run the second round of data collection to understand Wikipedia citation usage  Done
  • Prepare the data and analyze the data collected in the second round.  In progress
  • Perform first round of survey data collection of reader citation usage on English Wikipedia. task T205164  Partially done
  • Analyze first round survey data of reader citation usage task T205165   To do



  Note: October 18, 2018

The survey work has been ported to Qualtrics and a privacy statement has been submitted to Legal for review.

  Note: November 13, 2018

Second round of data collection to understand Wikipedia citation usage is now   Done

  Note: December 14, 2018

Analyzing the data is still   In progress and will continue through Q3 for paper submissions. Further data collection is awaiting end of fundraising before we pick up again and will begin the analysis when it's ready.

Outcome 4 / Output 6


More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.

Host the WikiCite 2018 event


  • Host the WikiCite 2018 event in Berkeley, CA (November 27-29, 2018)   Done



  Note: November 2018   Done We hosted 115 librarians, developers, linked open data experts, Wikimedia contributors for WikiCite 2018 in Berkeley. See the final program, live-stream, and online conversation from the event.

Outcome 1 / Output 1


Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

A map of verifiability of information in Wikimedia projects

Dependencies on: External collaborators


  • Generate a "map of verifiability" by quantifying citation need across article topics and Wikipedia language editions adapting the machine learning model developed in Q2 to a multilingual context task T213927   Done



  Note: January 16, 2019

Work is   In progress with our collaborators to assess the extensibility of the current model (trained on English Wikipedia) to other languages. The next steps also include identifying the set of languages we want to work with and the topic modeling approach to compare topics across them. We will also hear by the end of January about our paper submission on this line of research from Q2.

  Note: February 2019

The models are developed for French and Italian languages. We will generate the map of verifiability next.

  Note: March 2019


Outcome 1 / Output 2


Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

Research to understand how readers use citations



Quantitative analysis

  • Perform first round of research to characterize readers' usage of citations based on the features extracted in the past quarter. task T212225   Done
  • Fix the main bugs and rerun the CitationUsage schema to collect data for understanding citation usage on medical content task T212937   Done

Qualitative analysis

  • Perform first round of survey data collection of reader citation usage on English Wikipedia. task T205164   Done
  • Analyze first round survey data of reader citation usage task T205165   Done
  • (Conditional on the result of the previous goal) Perform the second round of survey data collection of reader citation usage in one or more Wikipedia languages.   Done



  Note: January 16, 2019

Qualitative research: the first round of data collection is completed and analysis is currently in progress. Quantitative research: data analysis is underway. Fixes for the data quality issues identified in Q2 are being researched and will be deployed for a second round of data collection between January and February.

  Note: February 22, 2019

The first two goals are completed and we are working on the last one. Specifically, we're designing a fixed-response survey based on the free form text responses from round 1. This last goal is also on track and we will deploy the survey in early March.

  Note: March 14, 2019

  • Perform first round of research to characterize readers' usage of citations is now   Done
  • Fix the main bugs and rerun the CitationUsage schema to collect data for understanding citation usage is   Done

Outcome 2 / Output 3


Contributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.

A public reference event stream

Dependencies on: Analytics, Citoid, Parsoid, Reading Infrastructure, Internet Archive





  Note: January 16, 2019

Analysis of the requirements for the MVP of the event stream is   Done.

  Note: February 22, 2019

This goal is in an incredibly good shape. :) We expected to finish all the work for this output in Q4 but we have a working prototype, are addressing some bugs, and expect to wrap up the output fully by the end of Q3.

  Note: March 2019

The MVP is live and Internet Archive has been using it to archive links.   Done

Outcome 4 / Output 6


More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.

Host the WikiCite 2018 event

Dependencies on Technical Engagement and Community Programs





  Note: January 16, 2019

We are in the process of preparing a survey for WikiCite participants, with the goal of incorporating their feedback in the annual report to the funder, which is due in February.

  Note: February 22, 2019

The report preparation is   In progress and on track.

  Note: March 2019

The goal is now tracked under Community Engagement as the Principal Investigator for WikiCite changed from Research to CE. The PI has learned that the deadline for the report is May 2019 and due to the transition in Research, medium-term planning, and annual planning work the finishing of this report is pushed to May 2019 (Q4).

Outcome 1 / Output 1


Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

A map of verifiability of information in Wikimedia projects

Dependencies on: External collaborators


  • Release code and model for the models developed so far task T221006   Done
  • Finalize documentation to empower others to build on the results and/or expand the work to other languages task T221009   Done
  • (stretch) Improve and analyze the data further time permitting task T221005 Declined



  Note: April 2019

Work is in-progress and we expect to be able to meet the goals set.

  To do May 2019


  To do June 2019


Outcome 1 / Output 2


Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.

Research to understand how readers use citations



Quantitative analysis

  • Deeper research and analysis of the citation usage and user behavior considering user activities in the entire user session. task T212225  Done
  • (stretch) A model for categorizing external links Postponed

Qualitative analysis

  • Perform the analysis and write the documentation on the second round of survey data about reader citation usage in one or more Wikipedia languages   To do
  • Develop interview protocol and begin interview recruiting for contextual inquiry follow-up study   In progress

Dependency on formal collaborators



  Note: April 2019

Work in progress.

  To do May 2019


  To do June 2019


Outcome 2 / Output 3


Contributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.

A public reference event stream


  • Fix citation stream bugs that affect the minimum viable product   Done



  Note: April 2019

The major bug has been fixed: task T216249.

  To do May 2019


  To do June 2019


Outcome 4 / Output 6


More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.

Host the WikiCite 2018 event

Dependencies on Technical Engagement and Community Programs





  Note: April 2019

As described in the last update from Q3, this goal is now scheduled to be accomplished by the end of May 2019.

  To do May 2019


  To do June 2019

The report is prepared and delivered to the funder. The organizing committee is reviewing the report and iterating over it before it's publicly released.