Wikimedia Search Platform/tr
Group: | Teknoloji |
Team: | Ekiplerde:
|
Since: | Aralık 2017 |
Arama Platformu ekibi (Wikimedia Teknoloji'nin bir parçası), MediaWiki için çeşitli Arama özelliklerinin ve API'lerin korunmasından ve geliştirilmesinden sorumludur. Buna CirrusSearch uzantısı, Elasticsearch, Wikimedia Vakfı'nda Wikimedia projelerini desteklemek için kullanılan arama arka ucu ve Wikidata Sorgu Hizmeti, SPARQL dahildir. Vikiveri sorgulamak için kullanılan uç nokta.
Bu ekibin mevcut çalışması, Phabricator'daki Discovery-Search çalışma panosunda izlenmektedir (biriktirme listesi panosu burada).
Mission
editOur mission is to help people easily discover knowledge on Wikipedia and its sister projects by providing tools and infrastructure for casual readers and expert users with precise needs, while maintaining a strong emphasis on privacy.
Overview
edit- We operate and maintain a disparate collection of production services related to content discovery, enabling the wiki community to find information that is not available through simply following links. We also provide a platform on which other people can create tools to support editing and other workflows.
- We provide an open-source search engine, backed by an inverted index for non-structured on-wiki data. We work to develop more sophisticated searching with machine learning and natural language processing.
- We provide a SPARQL-based query service for Wikidata, encouraging users to capitalize on this vast store of computer-readable structured data for use on-wiki and in knowledge discovery.
- We endeavor to support underserved wiki communities, and we rely on those communities to help us understand their needs and evaluate potential solutions, especially with respect to underserved languages.
- We prioritize privacy for logged-in users and anonymity for logged-out users over almost everything else, even when it slows down or complicates development or hinders our ability to collect or use data.
Goals
editThe Search Platform team's goals are part of the entire Technology Department's goals. You can links to the current quarterly goals here. (Note that Q1 is July–September for random historical reasons.)
Other Projects
editWikidata Query Service (WDQS)
editThe Wikidata query service allows for searching structured data on Wikidata. It also provides an API through which tools can access Wikidata. Our current work is tracked on the Discovery-Search workboard (see also our WDQS backlog board) and weekly deployments of WDQS are documented on wikitech:Deployments. A public WDQS Analytics Dashboard is used to monitor and analyze the impact of our efforts. w:SPARQL
APIs
editApplication Programming Interfaces (APIs) provide developers ways to interact with the MediaWiki software.
API:Search and discovery lists the search APIs available and in development.
The Team
editThis list was last updated on December 3, 2020.
- Erik Bernhardson, Tech Lead, Staff Software Engineer
- David Causse, Senior Software Engineer
- Trey Jones, Senior Computational Linguist
- Ryan Kemper, Site Reliability Engineer
- Peter Fischer, Senior Software Engineer
- Guillaume Lederrey, Engineering Manager
Communications
editMailing lists
editSearch Platform - A public mailing list about the Wikimedia Search Platform team and projects (formerly Discovery Department). Examples of topics would include:
- Announcements, including major upcoming initiatives, completed major releases, quarterly or annual plans, requests for feedback or input
- Technical discussions and brainstorming regarding our work:
- Search, Elastic, Cirrus, the Relevance Forge, and other relevant subjects
- Our dashboards or related analysis
- Other team news, such as changes to team structure, significant changes to processes, changes in how we use phabricator or other tools like gerrit
IRC channels
editOffice Hours
editThe Search Platform Team usually holds office hours on the first Wednesday of each month. Come talk to us about anything related to Wikimedia search! Feel free to add your items to the Etherpad Agenda for the next meeting.
Weekly status updates
editSee Discovery weekly status updates for the archive of past team updates. Note: these updates are now part of the Scrum of Scrums weekly updates, as of September 26, 2019.
Meetup groups
edit- San Francisco
- Directly relevant
- Bay Area NLP (natural language processing, not neuro-linguistic programming)
- San Francisco text
- Elasticsearch San Francisco
- Indirectly related (these sorts of meetup groups attract smart/enthusiastic people who like to spend their free time learning and solving problems)
- Directly relevant
Process
editThe Search Platform team uses a Scrumban process, which is a hybrid of Scrum and Kanban. It is described here: Search Platform Process.
Conferences, gatherings, and other events
editUpcoming events
edit- none scheduled due to COVID-19 global pandemic
Past events
edit- All Hands - January 2018
- Hackathon 2018 - 18 – 20 May 2018
- Wikimania 2018 - July 18-22, 2018
- 17th International Semantic Web Conference (ISWC 2018) - October 8-12, 2018
- October 22 - 25, 2018, Wikimedia Technical Conference (WMTechConf, formerly known as DevSummit) in Portland, Oregon
- Late January / early February 2019, All-Hands, San Francisco
- May 2019, Hackathon, Prague
- Late January / early February 2020, All-Hands, San Francisco
- May 2020, Hackathon, Tirana, Albania (virtual due to COVID-19 global pandemic)
Docs and Other Links
edit- Help:CirrusSearch - Information on how the Wikimedia search works.
- Completion Suggester - incremental search
- Search Glossary—a place for definitions, context, and links for terms we use that other people may not be familiar with
- Testing Search—testing search changes is complicated!
- Elasticsearch stats on Grafana
- Data access and analysis guidelines used by the Search Platform team around data sources, or by other teams around Search Platform data sources, are documented on Meta
- BrowserBot - a browser test bot for search
- Top Unsuccessful Search Queries - The difficulties in creating a list of unsuccessful search results
- Cross-wiki Search Result Improvements
- TextCat—a software component used for doing language detection
- Data Analysis Archive
- Wikidata Query Service Analytics Dashboard
- External Traffic Analytics Dashboard
- Please note that these boards are no longer being updated as of September 2019 and are only for historical purposes. The Search Analytics Dashboard and API Analytics Dashboard have been decomissioned.
The Search Platform team was formerly part of the Discovery Department in Audiences; but, as part of the re-organization (tune-up) of June 2017, the Search Platform team is now part of Technology. Pages of historical note:
- Discovery Department (April 2015 - December 2017)
- Search (prior to April 2015)
Deployers
editUseful reference for who can deploy code. It's nice to know whom to bug if you need something:
Person | MediaWiki
Deployer |
Elasticsearch
Deployer |
Maps
Deployer |
Graphoid
Deployer |
Portals Deployer |
---|---|---|---|---|---|
dcausse | |||||
ebernhardsen | |||||
jan_drewniak | |||||
gehel |
Code
editThe Search Platform team supports the following code:
This page or project is maintained by Wikimedia Search Platform.
Get help:
|