Wikimedia Search Platform

The Search Platform team (part of Wikimedia Technology) is responsible for maintaining and enhancing the various Search features and APIs for MediaWiki. This includes the CirrusSearch extension which relies on Elasticsearch, the search backend used at the Wikimedia Foundation to support Wikimedia projects.

Current work by this team is tracked on the Discovery-Search workboard in Phabricator (backlog board is here).

MissionEdit

Our mission is to help people easily discover knowledge on Wikipedia and its sister projects by providing tools and infrastructure for casual readers and expert users with precise needs, while maintaining a strong emphasis on privacy.

OverviewEdit

  • We operate and maintain a disparate collection of production services related to content discovery, enabling the wiki community to find information that is not available through simply following links. We also provide a platform on which other people can create tools to support editing and other workflows.
  • We provide an open-source search engine, backed by an inverted index for non-structured on-wiki data. We work to develop more sophisticated searching with machine learning and natural language processing.
  • We provide a SPARQL-based query service for Wikidata, encouraging users to capitalize on this vast store of computer-readable structured data for use on-wiki and in knowledge discovery.
  • We endeavor to support underserved wiki communities, and we rely on those communities to help us understand their needs and evaluate potential solutions, especially with respect to underserved languages.
  • We prioritize privacy for logged-in users and anonymity for logged-out users over almost everything else, even when it slows down or complicates development or hinders our ability to collect or use data.

GoalsEdit

The Search Platform team's goals are part of the entire Technology Department's goals. You can links to the current quarterly goals here. (Note that Q1 is July–September for random historical reasons.)

Other ProjectsEdit

Wikidata Query Service (WDQS)Edit

The Wikidata query service allows for searching structured data on Wikidata. It also provides an API through which tools can access Wikidata. Our current work is tracked on the Discovery-Search workboard (see also our WDQS backlog board) and weekly deployments of WDQS are documented on wikitech:Deployments. A public WDQS Analytics Dashboard is used to monitor and analyze the impact of our efforts.

APIsEdit

Application Programming Interfaces (APIs) provide developers ways to interact with the MediaWiki software.

API:Search and discovery lists the search APIs available and in development.

The TeamEdit

This list was last updated on September 17, 2020.

CommunicationsEdit

Mailing listsEdit

Search Platform - A public mailing list about the Wikimedia Search Platform team and projects (formerly Discovery Department). Examples of topics would include:

  • Announcements, including major upcoming initiatives, completed major releases, quarterly or annual plans, requests for feedback or input
  • Technical discussions and brainstorming regarding our work:
    • Search, Elastic, Cirrus, the Relevance Forge, and other relevant subjects
    • Our dashboards or related analysis
  • Other team news, such as changes to team structure, significant changes to processes, changes in how we use phabricator or other tools like gerrit

IRC channelsEdit

#wikimedia-discovery connect

Office HoursEdit

The Search Platform Team usually holds office hours on the first Wednesday of each month. Come talk to us about anything related to Wikimedia search! Feel free to add your items to the Etherpad Agenda for the next meeting.

Weekly status updatesEdit

See Discovery weekly status updates for the archive of past team updates. Note: these updates are now part of the Scrum of Scrums weekly updates, as of September 26, 2019.

Meetup groupsEdit

ProcessEdit

The Search Platform team uses a Scrumban process, which is a hybrid of Scrum and Kanban. It is described here: Search Platform Process.

Conferences, gatherings, and other eventsEdit

Upcoming eventsEdit

  • none scheduled due to COVID-19 global pandemic

Past eventsEdit

Docs and Other LinksEdit

The Search Platform team was formerly part of the Discovery Department in Audiences; but, as part of the re-organization (tune-up) of June 2017, the Search Platform team is now part of Technology. Pages of historical note:

DeployersEdit

Useful reference for who can deploy code. It's nice to know whom to bug if you need something:

Person MediaWiki

Deployer

Elasticsearch

Deployer

Maps

Deployer

Graphoid

Deployer

Portals Deployer
dcausse    
ebernhardsen    
jan_drewniak  
gehel      

CodeEdit

The Search Platform team supports the following code:

Repository Phabricator/Diffusion Github mirror
CirrusSearch extension https://phabricator.wikimedia.org/diffusion/ECIR/ mediawiki-extensions-CirrusSearch
Elastica extension https://phabricator.wikimedia.org/diffusion/EELA/ mediawiki-extensions-Elastica
GeoData extension https://phabricator.wikimedia.org/diffusion/EGDA/ mediawiki-extensions-GeoData
Wikidata Query Service https://phabricator.wikimedia.org/diffusion/WDQR/ wikidata-query-rdf
Wikidata Query Service GUI https://phabricator.wikimedia.org/diffusion/WDQG/ wikidata-query-gui
WDQS deployment https://phabricator.wikimedia.org/diffusion/WDQD/ wikidata-query-deploy
WDQS GUI deployment wikidata-query-gui-deploy
PHP textcat https://phabricator.wikimedia.org/diffusion/WTEX/ wikimedia-textcat
Relevance Forge wikimedia-discovery-relevanceForge
Discernatron wikimedia-discovery-discernatron
Discovery Analytics https://phabricator.wikimedia.org/diffusion/WDAN/ wikimedia-discovery-analytics
Lucene Explain Parser https://phabricator.wikimedia.org/diffusion/WLEP/ wikimedia-lucene-explain-parser