Discovery/Status updates/2018-10-01

This is the weekly update for the week starting 2018-09-17 through 2018-10-05



  • Implemented indexing statement values as part of main data in Wikidata, so that statement values are now searchable without special syntax [1]
  • Reindexed wikidata which also enables qualifier indexing [2]
  • Mathew worked on resolving an elasticsearch shard size alert by doing an in place reindex [3]
  • There was a lot of work done to investigate a brief outage of CirrusSearch (mw exception spike for api.php) [4], but it's resolved enough for now.
  • Gehel and others worked on refactoring puppet to support multiple elasticsearch instances on same node [5]
  • Erik worked on an issue where the text content of wiki page in search index can merge words making them unfindable [6]
  • Stas updated the search engine of Wikidata to enable searching by author name string [7]
  • David and Erik worked together on evaluating adding an image quality score to media search result ranking [8]
  • Stas added X-Search-Id to WikidataCompletionSearchClicks events [9]
  • David added a way to configure timeouts of autocomplete queries [10]
  • Erik upgraded saneitizer to constantly re-index documents [11]
  • David investigated why interwiki cache hit/miss was no longer reported (since 2017) and decided to drop the support for caching interwiki queries [12]
  • Mathew and Gehel worked on raising the alert level on disk space for old elasticsearch servers [13]
  • Erik worked to correct issues where the Cirrus MLT cache had a 0% hit rate on switchover [14]


  • Added new NTriples RDF dump (which makes it easier to do per-line processing) [15]
  • Internal cluster switched to Kafka events as change source, public cluster next [16]

Did you know?Edit

  • Different languages can have a different number of sounds they use; the set of sounds used in a particular language is called its “phonemic inventory”. [17] The numbers of sounds can range from 11 to over 140! Having more sounds than letters, or different sounds than the usual sound associated with a letter, can be the source of unusual orthographies and/or transliteration schemes—including "q" formerly being used as a vowel in Natqgu (now Natügu), a language of the Solomon Islands.