Wikimedia Discovery/Meetings/Search retrospective 2016-07-06

Format edit

This retrospective was conducted using the "Five finger retrospective" format.

Action Items from last month edit

  • Chris: post the link to his "what technical collaboration team does" presentation
    •   Done
  • Chris: Chris and Erik should talk about implications of interwiki search indices
    •   Done
  • Trey & Deb: Chris needs to be aware of the ? at the end of queries
  • Erik: Figure out a plan to reliably monitor github (David and Guillaume have started to watch it)
    •   Done, decided that volume is low enough that watching projects and getting emails from github is sufficient
    • Got Discovery members admin rights on github projects

What happened since the last retro (June 1) edit

  • new elasticsearch servers
  • Change in product ownership for Q1: Dan -> Deb
  • Wikimania and chatting with community and others about what the Discovery team is doing with search
  • Job offer extended and accepted for new analyst, starting around the end of July \o/^_^+1
  • Dan's vacation / Deb did a great job filling in

Thumb: Thumbs up--something that went well edit

  • Chris, Deb, and Trey chatting about question-mark handling +1+1
  • Fixes & improvements & relaunch of the TextCat A/B Test
  • Elasticsearch servers are SOO easy to install (if you don't count the required cluster restarts)+1
  • Deb did a great job filling in while Dan was out +1
  • phan is proving a useful addition to CI testing+1+1
    • phan is a static php code analyzer - https://github.com/etsy/phan
    • This is a Discovery initiative; would like to spread to other groups over time

Index finger: The ONE thing you want people to know (about how this team has functioned over the last month) edit

  • somehow we partially own the production logging infrastructure (by being elasticsearch "experts") +1 (Guillaume get a quite a few questions on logstash, where I have no idea...)
    • was this "somehow ownership" transferred from Bryan Davis's "somehow ownership" of it previously? :-)
    • Questions for the future: Who will be responsible for new hardware? Should we become the official owners?

Middle finger: Something that did not go well edit

  • Issue that affects the elasticsearch cluster (being discussed here: https://github.com/elastic/elasticsearch/issues/19187 )
    • Generates a ton of logs; fills the disk
    • Might be fallout from upgrading the clusters
  • Maintaining the swift repo plugin is hard because we don't use it (https://github.com/wikimedia/search-repository-swift )
    • David has spent days trying to fix it when broken
    • We should look for a new maintainer—maybe add a disclaimer in the README
  • Initial run of the TextCat A/B Test +1 (alas)
    • After a strong analysis, the data we were collecting were unreliable ("visit pages" were completely wrong)
    • Contributing factor: No automated tests for the logging code
    • Contributing factor: No front-end engineer, so not expert in browser-specific issues
    • Contributing factor: More than 20 ways to perform a search; complex code
    • Related factor: We already knew there was a mismatch in counts—this forced us to diagnose and fix it
  • Maps has a tendency to absorb a lot of my (Guillaume's) time. Prioritization needs to happen between different sub teams. Not sure how to make that happen.
    • If you need more of Guillaume, let him know, and he can try to reallocate his time.
    • Could shift more coding to developers, and leave Guillaume to review/finalize
    • If you see something is stuck, let him know. If he doesn't hear anything, he'll assume things are ok.
  • cindy (automated tests) has started acting up again, after the last round of fixes had it working well for a month or so+1
    • Mysterious errors; very common on local vagrant instance
    • Is integration testing worthwhile, from a cost/benefit basis?
      • "I can't live without Cindy now" +1
    • Some cindy errors are now being caught by phan
    • Other team runs tests as part of jenkins; we don't, partly because of the elastic dependency

Ring finger: Something about relationships--within the team, between teams, other edit

  • working with mobile team to implement geo features
    • seems to be going well so far
  • Thanks to Erik for answering all of Deb's 'newbie' search and overall team work questions
  • Weekly video chats with David, Erik, and Trey have been both productive and good "almost" face time
  • Thanks to Guillaume for answering "newbie" questions about web requests and caching (great help during the whole thing with Legal) - It was fun, I learned a lot as well!
  • Marcus Kroetsch (with Technical University of Dresden) is about to run research for WDQS usage
  • Had very interesting talk with Fabian Suchanek from YAGO  (Yet Another Great Ontology) http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/
    • Potential future collaborations
  • Guillaume working hard on getting integrated in the Ops team. Thanks for your support in that.
    • Seems to be going well
    • I am trying to spend more time on "real" Ops stuff (clinic duty, looking a mediawiki servers, ...)

Pinky: A little thing that would be easy to overlook edit

  • The "wrong keyboard" analysis turned up a *lot* of "Latin Russian" in ruwiki. There's a lot there (maybe 1% of queries) that could be improved.

Action items edit

  • David: Look into getting out of maintaining the swift plugin
  • Deb look at prioritising/defining the "Latin Russian/Latin Hebrew" problem? - https://phabricator.wikimedia.org/T138958
    • Resolved: put in the "This Quarter" column on the Discovery Search backlog
  • Kevin: Send reminder one day before next retro... except not to Guillaume? ;-) [He prefers to respond "in the moment"]