Outreachy/Past projects

This page tries to keep up with the current status of all past Outreach Program for Women/Outreachy projects.

See also Google Summer of Code/Past projects.

Quantitative summary of past Outreachy projects

edit

Completed Outreachy projects since 2013:

In the 18 Outreachy rounds between 2013 and 2021, contributors joined from 22 countries: India, United States, Brazil, United Kingdom, Sri Lanka, Canada, Israel, Romania, Germany, Turkey, Cameroon, Kenya, Nigeria, Vietnam, Taiwan, Nepal, Bangladesh, Russia, Malaysia, Uganda, France, Pakistan.

Create tool for informative infographics from structured information from Wikimedia projects

edit
  • Mentees: James Okolie
  • Mentor(s): Éder Porto, Lucas Belo
  • Outcome: Wiki Infographics is an initiative from the Wiki Movimento Brasil user group. The idea is to leverage structured information within Wikimedia projects to create informative and visually engaging infographics in fixed and dynamic formats, under an open license. The success of the initiative will be measured by the production and dissemination of a methodology and platform for high-quality infographics derived from structured data on Wikimedia projects.
  • Tech stack: Familiarity with HTML CSS JS is required, Python3 and Jupyter notebooks experience is preferred.
  • Relevant links: Phabricator issue
  • Blog: My Outreachy Journal - May 2024 to August 2024

Build a data visualization tool for the evolution of Wikipedia articles maintained by WikiProjects

edit
  • Mentees: Mahima Agarwal
  • Mentor(s): Pablo Aragón, Isaac Johnson, Caroline Myrick
  • Outcome: Using this model to predict the quality of Wikipedia articles, the WMF's Research team built a dataset with feature values and predicted quality scores for all revisions of all articles in more than 300 language editions of Wikipedia. Thus, this dataset provides key information on the expansion and quality of Wikipedia articles over time. In this project we built on this novel dataset to develop a visualization tool that will allow anyone to explore the evolution of quality in articles maintained by WikiProjects. It provides insights into the quality and importance of articles within specific WikiProjects.
  • Tech stack: Familiarity with Python is required, PMediaWiki APIs experience and Data Visualization Libraries experience is preferred.
  • Relevant links: Phabricator issue
  • Blog: My Outreachy Project

Improve documentation of Programs & Events Dashboard

edit

Improve how Wiki Education Dashboard counts references added

edit

Integrate Wikimedia Ecosystem within BUB2 tool

edit

Multilingual Wikipedia Editor Survey

edit
  • Mentees: Shriya Chaitanya Kamat Tarcar
  • Mentor(s): Mike Raish
  • Outcome: Help us conceive, write, build and administer, and analyze a survey that will help us better understand those people who edit and use translation to contribute across different language versions of Wikipedia. The intern will participate in all stages of research, from meeting with stakeholders to decide what questions to ask, all the way through data collection, analysis, and composition of a research report.
  • Skills: Design research, research, survey design, project management, knowledge of general language/culture topics, data analysis and presentation (all desirable)
  • Relevant links: Phabricator description
  • Blog: Shriya's Blog

Assist Capacity Exchange Development

edit

Wikicurricula as a user interphase for Wikidata for Education

edit
  • Mentees: Boluwatife Adetayo
  • Mentor(s): Sailesh Patnaik, Nat Hernández Clavijo, Luca Martinelli
  • Outcome: Wikidata for Education is a curricula digitisation project aiming to align Wikimedia projects with school curricula with the help of Wikidata. It was piloted in Ghana and extended to Uruguay. A friendlier user interface is needed so that editors and educators can visualize and explore the curriculum topics and structure. We believe that Wikimedia Italy's Wikipedia e Scuola Italiana, and its fork Wikicurricula are good starting points. We are now aiming to create a boilerplate project so that this visualization tool is easily reusable for new countries and languages. We also need to improve the integration between Wikidata and the visualization, and improve it's user interface.
  • Skills: Required: HTML, CSS, JS, Python, Git. Helpful: SQL, Wikidata, Wikidata Query Service, d3js library, Spanish, Italian, UX design
  • Relevant links: Phabricator issue
  • Blog: Boluwatife's Blog

Addressing the Lusophone technological wishlist proposals

edit
  • Mentees: Alwoch Sophia
  • Mentor(s): Éder Porto, Mike Peel, Albertoleoncio
  • Outcome: The Lista de desejos tecnológicos da lusofonia is a survey on the lusophone communities to identify what are the technological innovations that could be developed, and what tools and platforms could be changed to improve user experience. The communities proposed and later prioritized these ideas for developments and platform changes. We want to start to tackle these proposals and improving user experience.
  • Skills: Required: Python and JavaScript
  • Relevant links: Phabricator description
  • Blog: Sophia's Blog

Create a Ruby Gem to analyze Wikidata Statistics

edit
  • Mentees: Sulagna Saha
  • Mentors: Sage Ross, Will Kent
  • Outcome: Published the gem which provides functionality to parse the differences between Wikidata revisions and extract statistics about the changes. It enables accurate analysis of Wikidata edits, such as counting the number of claims, qualifiers, references, aliases, labels, descriptions and site links added, removed, and changed. The gem is integrated to Programs and Events Dashboard and deployed.
  • Tech Stack: The gem is written by Ruby programming language.
  • Relavant links: Wikidata-diff-analyzer, Github Repo for the gem, Integration
  • Blog: Sulagna's Blog

Content Translation language imbalances

edit
  • Mentee: Nathaly Toledo.
  • Mentors: Adam Wight, Kavitha Appakayala, Jan Dittrich.
  • Outcome: Two research questions were solved and one was advanced. The research questions and related reasoning were:
    • What is the content being translated the most? What patterns can be found? Try finding a dataset that will let you know what articles lack translations (calculate an average), and classify them to understand patterns that could lead to an answer.
    • RQ 3.1: What is the effect of MT availability on translation flow? Let’s consider three distinct types of events changing MT availability: enabling MT where there was none, changing default MT engine, and disabling MT. The question explores the impact machine translation would have on better more translation are sent or started as a consequence of the practicality and also whether they are less likely to be deleted soon after being created.
    • Do users prefer to translate content in their native language(s)? If so, what influences this behavior? The question is based on the assumption that the strongest communities also correspond to the larger languages, and these communities tend to be under the “self-focus” bias, which prompts tend to create and translate content in their first language first (and about their own culture first). It also assumes that the most confortable someone is in their own language levels, the more likely they are to translate in it.
  • Tech stack: Python, Jupyther Notebooks.
  • Relevant links:

Content Translation language imbalances

edit
  • Mentee: Abhishek Bhardwaj.
  • Mentors: Adam Wight, Kavitha Appakayala, Jan Dittrich.
  • Outcome: We developed a reusable, editable python package that extracts data from the Wikimedia database. The current version of our package contains modules to extract language proficiency data of translators from all the Wikipedia versions. We also did data warehousing caching the data whose generation is costly (saving hours of run-time for anyone who wish to use it). We did the analysis of the generated data to find trends in user activities and dig deeper into the relation between translation imbalances and proficiency of translators and what is the optimal language pair for translation for each user based on their self reported language proficiency.
  • Tech Stack: Python, PAWS, MariaDB.
  • Relevant Links: Research Page, GitHub Repository

Develop a web app for editing Toolhub records

edit
  • Mentees: Nicole Barnabee-Burns, Hannah Waruguru Njoroge
  • Mentors: Slavina Stefanova, Damilare Adedoyin
  • Outcome: Over the course of the internship, we developed a full-stack web application that could be used to improve discoverability of other Wikimedia tools. The tool identifies gaps in the Toolhub records of other tools, and presents a user-friendly interface for filling in the missing information.
  • Tech stack: The application was built with Vue.js on the front-end and Flask on the back-end, and is connected to a MariaDB database. Task queuing is handled by Celery, with Redis as a broker.
  • Relevant links: Toolhunt, Phabricator workboard, Frontend repository, Backend repository
  • Blog: Nicole's blog, Hannah's blog

Hybrid event production for QueeringWikipedia 2023

edit
  • Mentee: André Rodrigues
  • Mentors: Željko Blaće, Owen Blacker
  • Outcome: After investigating various FLOSS options and considering time commitments, we decided to use Zoom for regular meetings, Jitsi for unconference style sessions, and BigBlueButton for workshops and explanatory sessions. In addition, I conducted outreach and held office hours to promote the event during the internship period.
  • Relevant links: Phabricator page
  • Blog: André's blog

Develop features for Wiki Loves Monuments App

edit

Develop a web app for patrolling based on the new ML-based service to predict reverts

edit

Rewrite Imagebulk tool to scale up

edit
  • Mentees: Enow97
  • Mentors: Jay Prakash and Sudhanshu
  • Outcome: The project involved rewriting the existing web app codebase using Vue.js and Flask, along with integrating Celery to improve the scalability and performance of the system. The resulting system will be able to handle large volumes of traffic and complex user interactions while remaining responsive and efficient. Although, code has been written under this project but deployment is still being left and will be handle by mentor (Jay Prakash).
  • Tech stack:
  1. Vue.js on the front-end
  2. Flask on the back-end
  3. Task queuing in Celery along with Redis as the broker
  4. Docker

Add support for tracking specific namespaces to Programs & Events Dashboard

edit
  • Student: Vaidehi Atpadkar
  • Mentors: Sage Ross
  • Outcome: Dashboard now has a new feature of selecting specific wiki-namespaces for tracking and displaying the stats for them.
  • Relevant links: source code
  • Blog: Vaidehi's Blog

Build Python library to work with html-dumps

edit
  • Student: Nazia Tasnim
  • Mentors: Martin Gerlach, Isaac Johnson
  • Outcome: mwparserfromhtml, a python-library to parse the Wikipedia HTML dumps.
  • Relevant links: source code
  • Blog: Nazia's Blog

What's in a name? Automatically identifying first and last author names for Wikicite and Wikidata

edit

Automatically matching new Wikipedia articles with Wikidata items using Python

edit

Automatically matching new Wikipedia articles with Wikidata items using Python

edit

Develop learning toolkits and videos to demonstrate the use of essential tools for Wikimedia

edit

Improve Wikidata support on Programs & Events Dashboard

edit
  • Student: Ivana Novakovic-Lekovic
  • Mentors: Sage Ross
  • Outcome: Integrated Wikidata edit analysis into the Dashboard’s data update system; it now shares Wikidata edits details about merges, aliases, labels, claims, and more.
  • Relevant links: source code
  • Blog: Ivana's Blog

Refactor Mediawiki tests to use WebdriverIO Async

edit
  • Student: Osama Tahir
  • Mentors: Soham Parekh, Željko Filipin
  • Outcome: Refactored MediaWiki tests in wide range of extensions (such as Math, Newsletter, VisualEditor) to use WebdriverIO Async
  • Relevant links: source code
  • Blog: Osama's Blog

WikiNav

edit
  • Student: Muniza A.
  • Mentors: Martin Gerlach and Isaac Johnson
  • Outcome: Developed WikiNav, a tool that processes the Wikipedia clickstream data to generate statistics and visualizations that help make this data more accessible to folks with varying levels of programming and data wrangling experience.
  • Relevant links: Phabricator task, demo application
  • Blog: Muniza's Blog

Developing mwsql: A Python package for working with Wikimedia SQL dumps

edit

Synchronising Wikidata and Wikipedias using pywikibot

edit


Modules Research Tool

edit

Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia

edit

Wiki Country Inference Tool: A Model that Infers countries from Wikipedia Articles

edit

Developing a lightweight and efficient Content Filtration module for Wikimedia Commons

edit

Review and improve Lua documentation on meta and mediawiki

edit

Enhancements to gdrive-to-commons uploader tool

edit

Productionize Wikidata-based Topic Model on ORES

edit

WikiContrib: Gather and analyze user contributions on Wiki and GitHub

edit
  • Student: Raymond Ndibe
  • Mentors: Srishti Sethi and Rammanoj potla
  • Outcome: 1) Implemented feature to count contributions made to Wikimedia repositories on GitHub 2) Implemented contributions caching feature 3) Implemented persistent URL feature 4) Fixed all outstanding issues and bugs 5) Improved the tool's UI/ UX.

Converting Campaign pages to React

edit
  • Student: Lalitha Reddy
  • Mentors: Sage Ross, Khyati Soneji
  • Outcome: Created the campaign navbar and the home tab component in React.
  • Relevant links: project task, bi-weekly reports

Improvements and User Testing of Wiki Education Dashboard Android App

edit

A system for releasing data dumps from a classifier detecting unsourced sentences in Wikipedia

edit

Documentation improvements to the ~20 top 100 most viewed MediaWiki Action API pages on-wiki

edit

Create regression automated tests for Special:Homepage functionality testing

edit

Improve MediaWiki Action API Integration Tests

edit

Documentation improvements to the ~20 top 70 most viewed MediaWiki Action API pages on-wiki

edit

Improve Programs & Events Dashboard for use in the #1lib1ref campaign

edit
  • Student: Khyati Soneji
  • Mentors: Sage Ross, Wes Reid
  • Outcome: Added support for counting references added to English Wikipedia articles in Programs & Events Dashboard, along with improved data download options and support for scoping via PetScan PSIDs.
  • Relevant links: Internship blog posts, project task

Research project on the editing patterns of users of wiki CX translation tool

edit
  • Student: Doris Zhou
  • Mentor: Isaac Johnson, Jonathan Morgan
  • Outcome: Did research analyzing the editing patterns, article selection, and article writing quality of users who initiated article translation using the CX Translation tool. Looked specifically at English to French in depth and did some English to Chinese analysis.
  • Relevant links: bi-weekly reports, research meta page

Improve top 50 viewed pages of the MediaWiki Action API & create a demo app to educate users

edit

Update MediaWiki Action API docs, add Python code to repo, create a demo app, and write a tutorial for the demo which showcases several APIs.

edit

Write code in Parsoid to detect links inside links and in PHP Linter extension to add this category.

Provide Test Support for Various Wikimedia Projects

edit

Apply exploratory testing principles to test weekly maintenance releases of Content Translation tool and Visual Editor.

QA: Testing Automation - port Echo Notification tests to Node.js

edit

Created automated tests to check that updates to the changes made to the code base do no break existing components.

Create an event setup wizard for Programs & Events Dashboard

edit

Design, create and test a wizard which helps to make it easy for users to set up an event with exactly the settings they need, which is an interface that walks through all the main options and describes what they do and what they are for to help configure an event.

Improve support for photo/media contribution campaigns on Wikimedia Programs & Events Dashboard

edit

Made media contributions a first class citizen in the Wikimedia Programs & Events Dashboard. The project included building dedicated user-friendly pages for viewing and assessing the metadata of uploads from a specific campaign, and adding upload contribution statistics in other views alongside article statistics.

Automatically detect spambot registration using machine learning like invisible reCAPTCHA

edit

Create a captcha which is friendlier to humans and harder for bots to crack

Improvements to Grants review and Wikimania scholarships web apps

edit

Improve scholarships and grant review applications by important bug fixes and feature additions

Refactoring of MassMessage Extension

edit

Fix technical depth cleaning on MassMessage

Translation outreach: User guides on MediaWiki.org

edit

Create, test and document new strategies to recruit technical translators

User Contribution Summary Tool

edit

Create a tool that's optimized for presenting one's activity on wikipedia in a CV-like manner

Improve Programs & Events Dashboard support for Art+Feminism 2018

edit

Improve the Program & Events Dashboard from WikiEducation based on the feedback from the Art+Feminism campaign of 2018.

Remind me of this article in X days

edit

Make it possible for logged-in user to get a reminder of an article after a few

days. Possibility to enter a short comment.

Documentation on how to develop Zotero translators at translation-server

edit

Document the process of writing Zotero web translators on server side and on Scaffold and how to get them in production.

Allow Programs & Events Dashboard to make automatic edits on connected wikis

edit
  • Student: Medha Bansal
  • Mentors: Sage Ross and Jonathan Morgan
  • Status: All tasks as mentioned in the proposal and in the timeline have been completed. Project is live with all supporting documentation.
  • Link to project task on Phabricator: T158678
  • Link to weekly reports archives: Weekly reports

Creating User Profile Pages for Wiki Ed Dashboard and providing cumulative statistics for all programs a user has participated in.

edit

Added customizable Profile pages to the Wiki Education Dashboard and generated contribution statistics of the users, providing them a brief overview of all the contributions they made to encourage them to do more.

Easier categorization of pictures in Upload to Commons Android app

edit

This project improves the image categorization functionality of the app by offering relevant category suggestions based on geolocation, and making category search more flexible.

edit

The objective of this project is to offer a search tool to empower translators to find messages they want to translate and maintain consistency between translations.

Wikipedia article translation metrics

edit

"This project aims at building a model that would estimate whether a page is translated or not, using statistical analysis and machine learning tools."

Pywikibot compat to core migration

edit

"The purpose of this project is to improve all the documentation including getting started guides and project documentation in Pywikibot."

Wikipedia Education Program need-finding research

edit

"The task is to improve the function, usability and design of the course pages for both professors and students."

Collaborative spelling dictionary building tool

edit

"The project aims at developing a collaborative dictionary which shall also have an additional feature of checking spellings of the words."

Adding Performance Instrumentation to Parsoid

edit

"This project will develop a dashboard of metrics that will allow users to, at-a-glance, understand Parsoid's performance. It will provide a resource for application tuning, quick assessments of production readiness, and troubleshooting sources of performance problems."

  • Student: Christy Okpo
  • Mentors: Subramanya Sastry
  • Wrap-up blogpost: Link
  • Phabricator Evaluation task: T92244
  • Status: Dashboards have been created, here and here. A glossary of metrics and guide to performance instrumentation using Graphite, have also been created.

Extending PyWikiBot support to sites on IWM

edit

"PyWikiBot currently supports only a few wiki projects. At the end of this project, the benefits of automation of tasks by PWB will be provided to all MediaWiki sites on the meta:Interwikimap, and provide the basis for support of non-MediaWiki wiki sites and non-wiki sites."

  • Student: Manpreet Kaur
  • Mentors:John Mark Vandenberg, Fabian Neundorf
  • Wrap-up blogpost: Link
  • Phabricator Evaluation task: T92246
  • Status: Final report can be found here. Further work to be done on non-mw sites.

Improving URL citations on Wikimedia

edit

Aims to make citing sources in VisualEditor easier by generating a citation given a unique identifier such as a URL or DOI.

Enhancing Wikimaps/OpenHistoricalMaps Project

edit
  • Student: Jaime Lyn
  • Mentors: Dr. Rob Warren
  • Wrap-up blogpost: Link
  • Final report:
  • Status:

Welcome to labs - Welcoming new contributors to Wikimedia Labs and Tool Labs

edit

Finding the best and making them better: Evaluating, documenting, and improving MediaWiki web API client libraries

edit

Feed the Gnomes - Wikidata Outreach

edit

Template Matching for RDFIO

edit

WikiHunt the 'Property': Wikidata Outreach Initiative

edit