Discovery/Status updates/2018-07-30

This is the weekly update for the week starting 2018-07-30



  • David did a lot of work surrounding the upgrade to ElasticSearch 6.3:
    • updated an extra plugin[1]
    • updated the LTR plugin [2]
    • worked on an older issue that we were waiting to fix for an experimental plugin that was throwing an exception on some requests [3]
    • upgraded the experimental highlighter [4]
    • upgraded the extra-analysis plugin [5]
  • Trey fixed an address stemming issue in Polish analyzer for search [6] and it'll be fully in production when a re-index occurs [7]
  • Trey also worked on a review of the Esperanto Morphological Libraries when a volunteer wanted to help in this effort (and knew Esperanto) [8]
  • David worked on a few cirrus integration tests are were randomly and consistently failing (V => Venom > V:N) [9]
  • Trey took on the task of reviewing the work of applying Indonesian Analysis Chain for Malay (writeup and review here) [10]
  • David took on the fairly massive task of changing how SpecialSearch/SearchEngine handles the 'prefix' URI param set by the InputBox extension with lots of help from others [11]
  • David did quite a bit of work on deprecating the SearchEngine::replacePrefixes with several related patches [12]
  • Trey worked on exploring the potential applications for NLP to be applied in Search (review and write-up here) [13]
  • Stas implemented fulltext search for Lexemes when Lexeme namespace is requested in the search [14]
  • Stas added collecting click data from Wikidata prefix search into event logs so that we know what the users are selecting when using Wikidata prefix search [15] and it'll be in production late this week with the train
  • Erik and David worked on the completion suggester code which didn't gracefully handle shard failures during the fetch phase, which caused the response received by cirrus to not contain the necessary information [16] and it'll be in production late this week with the train
  • After a lot of discussion, Erik fixed an issue where intitle search didn't match the stop words [17] and it'll be in production late this week with the train
  • Gehel and Erik worked on ensuring discovery.query_clicks_* data is purged per privacy policy, with adding support for hourly or daily partition dropping [18]

Wikidata Query ServiceEdit

  • Wikibase Constraints violations are now loaded into WDQS database and are queryable. [19]