Discovery/Status updates/2019-01-07

This is the weekly update for the week starting 2019-01-07

Discussions

edit
edit
  • David discovered an issue with the click-through rate on one of the Search dashboards for mobile apps and enlisted Chelsy's help in fixing it quickly (done!) [1]
  • Mathew worked on increasing the number of shards for enwiki_general [2]
  • David helped to augmenting the list of known clusters using cluster conf for Mjolnir [3]
  • David updated the completion suggester: TP50 was increased from 9ms to 24ms [4]
  • The Search team worked on supporting searching multiple filetypes at once, based on input from the Multimedia team [5]
  • David and Mathew worked on allowing ElasticSearch machines to be able to communicate with each other on port 9500 and 9700 [6]
  • We found that most of the dashboards in grafana are designed to have a cluster per DC, and we needed to refactor them so that we can select a specific cluster (by adding chi, psi and omega selectors) [7]
  • The multi-instance support code added for ExternalIndex was designed without the group+replica concepts in mind, so we fixed ExternalIndex to support groups & replica topology [8]
  • There was a recent spike of fatal timeouts from API search suggestions (prefixsearch) that caused a number of user queries to become stalled for 60 seconds and then receive a generic error page without any results. We fixed this by merging a patch for language detection to not be run when rewriting is not enabled [9]

WDQS

edit
  • We have added a new keyboard shortcuts to WDQS UI, for those systems where Ctrl-Space is already taken - Ctrl-Alt-Space and Alt-Enter [10]
  • Stas found an issue where the WDQS puppet/hiera configs were too distributed, Mathew and Gehel worked on it with assistance from SRE (thanks!) [11]
  • Our database in WDQS seems to hit Blazegraph internal limits, which requires some careful work of rearranging the data to stay away from the limit. This work now has started [12]
  • Stas have fixed an issue where a large update could crash Updater [13]
  • Stas have fixed an issue where due to database replication delay, Updater could read an old version of the data from Wikidata [14]
  • Stas fixed an issue where SERVICE SILENT construct was producing errors despite standards saying it should not do that [15]

--