Discovery/Status updates/2018-06-04

This is the weekly update for the week starting 2018-06-04



  • After lots of talk about stemmers getting committed and plugins getting deployed, the Slovak-language wikis have finally been *reindexed*, and stemming [1] is now happening on the Slovak wikis!

Search—Time Machine EditionEdit

A few things from May that got missed:

  • Trey wrote up some potential applications of natural language processing (NLP) to on-wiki search [2]. We're still going through them to pick out a couple that we'll turn into projects, probably next quarter. Right now, spelling correction and entity extraction are high on the list, but more questions, comments, and suggestions are welcome.
  • Erik pulled 90 days worth of regular expression (regex) searches across all wikis, and Trey did a quick survey of the most common patterns. [3] There are a lot more regex searches than we thought—5.6 million in 90 days!—and three apparently automated processes (bots, apps, or tools of some kind) are responsible for more than 90% of the regex searches.