Wikimedia Technology/Annual Plans/FY2019/CDP3: Knowledge Integrity/Goals
Program Goals and Status for FY18/19
edit- Goal Owner: Leila Zia
- Program Goals for FY18/19: Wikimedia sites provide the most trustworthy, comprehensive, neutral information across topics and languages by referencing this information to vetted reliable sources and linking it to external content providers and metadata repositories, making Wikimedia projects the central gateway to access citable information in the knowledge ecosystem.
- Annual Plan: Segment 1 - Research
- Primary Goal is Knowledge as a Service: increase reach
Outcome 1 / Output 1
editWikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
- A map of verifiability of information in Wikimedia projects
Goal(s)
edit- Design and test and end-to-end machine learning framework to identify statements in need of a citation. Done
- Improving the taxonomy of reasons why editors add citations to Wikipedia statements Done
- Design the experiment and collect larger-scale data about reasons why people add citations Done
Status
editNote: July 2018
- In progress
Note: August 21, 2018
- In progress
Note: September 13, 2018
- In progress Details: we expect this goal to be fully done before the end of Q1. The first bullet point is expected to be done by the end of the month. The third bullet point is done and we have done extensive extra work on it as well. What is left from it is documentation which we expect to be done by 2018-09-18.
- Update on Sept 18: all goals for this outcome is Done
Outcome 1 / Output 2
editWikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
- Research to understand how readers use citations
Goal(s)
edit- Prepare the data and do preliminary analysis on the first data collection on citation usage based on data gathered via Citation Usage schema Done
- Develop a survey to better understand the role of citations in Wikipedia readers evaluations of Wikipedia articles and to identify opportunities for supporting their learning goals and increasing their digital literacy. Done
Status
editNote: July 2018
- In progress
Note: August 21, 2018
- Data collection is done and the documentation just needs to be finished up Done. Developing the survey is In progress and more information is in T199188
Note: September 18, 2018
- The survey wording and goals are now Done
Outcome 4 / Output 6
editMore knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.
- Funding the WikiCite event series
Goal(s)
edit- Fundraise for the annual meeting in the WikiCite series and set of satellite events, to improve the sustainability and global reach of the initiative. Done
- Organize the event, open the application process and design the program Done
Status
editNote: July 2018
- Done Fundraising is completed!
Note: August 22, 2018
- In progress Organizing the event is underway
Note: September 18, 2018
- Done The selection process has completed, notifications to applicants are being sent out as of October 1. The chairs of individual days of the event are now collecting information from selected attendees to finalize the agenda.
Outcome 1 / Output 1
editWikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
- A map of verifiability of information in Wikimedia projects
Goal(s)
edit- Design a machine learning framework to identify why statements need a citation in English Wikipedia. Done
- [Stretch] Submit a paper summarizing the modeling work for unsourced statement detection Done
Status
editNote: November 13, 2018
- This goal was finished up this week, yay!
Outcome 1 / Output 2
editWikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
- Research to understand how readers use citations
Goal(s)
edit- Run the second round of data collection to understand Wikipedia citation usage Done
- Prepare the data and analyze the data collected in the second round. In progress
- Perform first round of survey data collection of reader citation usage on English Wikipedia. task T205164 Partially done
- Analyze first round survey data of reader citation usage task T205165 To do
Status
editNote: October 18, 2018
- The survey work has been ported to Qualtrics and a privacy statement has been submitted to Legal for review.
Note: November 13, 2018
- Second round of data collection to understand Wikipedia citation usage is now Done
Note: December 14, 2018
- Analyzing the data is still In progress and will continue through Q3 for paper submissions. Further data collection is awaiting end of fundraising before we pick up again and will begin the analysis when it's ready.
Outcome 4 / Output 6
editMore knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.
- Host the WikiCite 2018 event
Goal(s)
edit- Host the WikiCite 2018 event in Berkeley, CA (November 27-29, 2018) Done
Status
editNote: November 2018 Done We hosted 115 librarians, developers, linked open data experts, Wikimedia contributors for WikiCite 2018 in Berkeley. See the final program, live-stream, and online conversation from the event.
Outcome 1 / Output 1
editWikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
- A map of verifiability of information in Wikimedia projects
Dependencies on: External collaborators
Goal(s)
edit- Generate a "map of verifiability" by quantifying citation need across article topics and Wikipedia language editions adapting the machine learning model developed in Q2 to a multilingual context task T213927 Done
Status
editNote: January 16, 2019
- Work is In progress with our collaborators to assess the extensibility of the current model (trained on English Wikipedia) to other languages. The next steps also include identifying the set of languages we want to work with and the topic modeling approach to compare topics across them. We will also hear by the end of January about our paper submission on this line of research from Q2.
Note: February 2019
- The models are developed for French and Italian languages. We will generate the map of verifiability next.
Note: March 2019
- Done.
Outcome 1 / Output 2
editWikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
- Research to understand how readers use citations
Goal(s)
editQuantitative analysis
- Perform first round of research to characterize readers' usage of citations based on the features extracted in the past quarter. task T212225 Done
- Fix the main bugs and rerun the CitationUsage schema to collect data for understanding citation usage on medical content task T212937 Done
Qualitative analysis
- Perform first round of survey data collection of reader citation usage on English Wikipedia. task T205164 Done
- Analyze first round survey data of reader citation usage task T205165 Done
- (Conditional on the result of the previous goal) Perform the second round of survey data collection of reader citation usage in one or more Wikipedia languages. Done
Status
editNote: January 16, 2019
- Qualitative research: the first round of data collection is completed and analysis is currently in progress. Quantitative research: data analysis is underway. Fixes for the data quality issues identified in Q2 are being researched and will be deployed for a second round of data collection between January and February.
Note: February 22, 2019
- The first two goals are completed and we are working on the last one. Specifically, we're designing a fixed-response survey based on the free form text responses from round 1. This last goal is also on track and we will deploy the survey in early March.
Note: March 14, 2019
- Perform first round of research to characterize readers' usage of citations is now Done
- Fix the main bugs and rerun the CitationUsage schema to collect data for understanding citation usage is Done
Outcome 2 / Output 3
editContributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.
- A public reference event stream
Dependencies on: Analytics, Citoid, Parsoid, Reading Infrastructure, Internet Archive
Goal(s)
edit- A working prototype of the stream task T199189 Done
Status
editNote: January 16, 2019
- Analysis of the requirements for the MVP of the event stream is Done.
Note: February 22, 2019
- This goal is in an incredibly good shape. :) We expected to finish all the work for this output in Q4 but we have a working prototype, are addressing some bugs, and expect to wrap up the output fully by the end of Q3.
Note: March 2019
- The MVP is live and Internet Archive has been using it to archive links. Done
Outcome 4 / Output 6
editMore knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.
- Host the WikiCite 2018 event
Dependencies on Technical Engagement and Community Programs
Goal(s)
edit- Publish the WikiCite 2018 annual report In progress
Status
editNote: January 16, 2019
- We are in the process of preparing a survey for WikiCite participants, with the goal of incorporating their feedback in the annual report to the funder, which is due in February.
Note: February 22, 2019
- The report preparation is In progress and on track.
Note: March 2019
- The goal is now tracked under Community Engagement as the Principal Investigator for WikiCite changed from Research to CE. The PI has learned that the deadline for the report is May 2019 and due to the transition in Research, medium-term planning, and annual planning work the finishing of this report is pushed to May 2019 (Q4).
Outcome 1 / Output 1
editWikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
- A map of verifiability of information in Wikimedia projects
Dependencies on: External collaborators
Goal(s)
edit- Release code and model for the models developed so far task T221006 Done
- Finalize documentation to empower others to build on the results and/or expand the work to other languages task T221009 Done
- (stretch) Improve and analyze the data further time permitting task T221005 Declined
Status
editNote: April 2019
- Work is in-progress and we expect to be able to meet the goals set.
To do May 2019
- Discussed...
To do June 2019
- Discussed...
Outcome 1 / Output 2
editWikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy.
- Research to understand how readers use citations
Goal(s)
editQuantitative analysis
- Deeper research and analysis of the citation usage and user behavior considering user activities in the entire user session. task T212225 Done
- (stretch) A model for categorizing external links Postponed
Qualitative analysis
- Perform the analysis and write the documentation on the second round of survey data about reader citation usage in one or more Wikipedia languages To do
- Develop interview protocol and begin interview recruiting for contextual inquiry follow-up study In progress
Dependency on formal collaborators
Status
editNote: April 2019
- Work in progress.
To do May 2019
- Discussed...
To do June 2019
- Discussed...
Outcome 2 / Output 3
editContributors, tool developers and partner organizations can understand and accelerate the referencing and linking of knowledge statements to external sources, catalogs, metadata providers, and content repositories.
- A public reference event stream
Goal(s)
edit- Fix citation stream bugs that affect the minimum viable product Done
Status
editNote: April 2019
- The major bug has been fixed: task T216249.
To do May 2019
- Discussed...
To do June 2019
- Discussed...
Outcome 4 / Output 6
editMore knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects.
- Host the WikiCite 2018 event
Dependencies on Technical Engagement and Community Programs
Goal(s)
edit- Publish the WikiCite 2018 annual report Done
Status
editNote: April 2019
- As described in the last update from Q3, this goal is now scheduled to be accomplished by the end of May 2019.
To do May 2019
- Discussed...
To do June 2019
- The report is prepared and delivered to the funder. The organizing committee is reviewing the report and iterating over it before it's publicly released.