Wikimedia Discovery/Meetings/Search retrospective 2016-08-19

Format edit

This retrospective was conducted using the "Five finger retrospective" format: https://www.mediawiki.org/wiki/Team_Practices_Group/Five_finger_retrospective

Action Items from last month edit

  • Chris: post the link to his "what technical collaboration team does" presentation
    •   Done
  • Chris: Chris and Erik should talk about implications of interwiki search indices
    •   Done
  • Trey & Deb: Chris needs to be aware of the ? at the end of queries
    •   Done, and material has been posted on several village pumps
      • https://meta.wikimedia.org/wiki/Discovery/Handling_question_marks_in_search_queries & talk page
  • Erik: Figure out a plan to reliably monitor github (David and Guillaume have started to watch it)
    •   Done, decided that volume is low enough that watching projects and getting emails from github is sufficient
    • Got Discovery members admin rights on github projects

What happened since the last retro (July 6) edit

  • Deployed "? stripping" in queries
  • Setup relforge cluster
  • Deployed textcat
  • Elasticsearch upgrade to 2.3.4 in progress
  • Discovery got a new analyst which will help a lot going forward, especially if we start building that probabilistic bot classifier thingy :P
  • Deployed logstash/kibana upgrade
  • Completed refactoring of the search fields
  • WDQS servers for codfw approved and on their way
  • Searcher class in cirrus is now < 1000 lines +1+1:D
  • Did research on the top 100-ish unsuccessful queries and decided not to go further with it due to lack of interesting data
  • Analysis of ascii folding and stemming

Thumb: Thumbs up--something that went well edit

  • We seem to have proven to many people's satisfaction that zero-results queries are not a good place to mine articles and redirects.
  • Trey's "?" blog post!
  • Blog post on textcat deployment+1
  • Addressed some technical debt and made code look saner+1
  • Trey's analysis of providing a list of search queries and the communication of the results
  • RelForge cluster has come into existence! And David has been able to index lots of data (enwiki & frwiki in two different ways!)

Index finger: The ONE thing you want people to know (about how this team has functioned over the last month) edit

  • (Things have generally been running smoothly. It doesn't feel like any ONE thing stands out.)+1+1

Middle finger: Something that did not go well edit

  • elasticsearch not stable
    • 2 major issues in the last month and a half. One is a mystery; one we understand and have some ideas for fixes
  • logstash upgrade delayed multiple times due to lack of preparation / thoroughness
  • seemingly not enough time in the day +1+1
    • Just a lot going on; everything takes time. Last couple weeks have been atypically busy.
    • KH: Try not to put in extra hours, generally. Time-sensitive occasional things are understandable.
  • internet connections have been a bit weird / dropping at inconvinient times ( I spellz gud ) +1 (hangouts have been dodgy)
  • cindy (automated browser testing) was acting up again, and we're still not sure why or have final fix
    • Recurring item. Do we want to think about shifting testing to unit test level? Or to php level?
    • We really do need this to work; feel much more comfortable merging when automated tests are working
    • Devs should work w/PO to make sure some time gets allocated to work on this

Ring finger: Something about relationships--within the team, between teams, other edit

  • Good communication about "search across projects and across languages"
  • Trey says: working with Deb & Chris to get blog posts out about developments has been great. Thanks to Deb for driving the process! +1(yay, thanks!)
  • Doing some good work with Graphs team to make visualizations easier (e.g. integrating w/WDQS)
  • Guillaume still split between multiple sub teams, no one is complaining...(I feel ya!)+1 - he's doing great!
    • Seems to be improving since the last retrospective
    • We have multiple sub-teams that are fairly independent

Pinky: A little thing that would be easy to overlook (or was overlooked) edit

  • Elasticsearch garden is not cultivated as much as it should (T109089) - for example: the multiple alerts when cluster is failing was there for a fairly long time, but we had that spam again today
    • Similar issue with maps: Small issues that are not critical; only get attention when they break. Could do better with that.
    • There are a lot of little issues, so it makes sense to prioritize them
    • Do you (GL) have knowledge/support to be able to prioritize your work?
      • GL: Would be interested in participating in a planning session
  • Some work ongoing with a new recommendation system that may need some help from cirrus developers (https://phabricator.wikimedia.org/T143197 )
    • Offline article recommendation system (similar to "more like")
  • Some help needed to catch obvious problems with bm25
  • Stas working on upcoming lectures / demos
    • Internal talk about SPARQL and WDQS, mostly technical, partly aimed at analysis folks
    • Wiki conference San Diego: Less technical audience

Action items edit

  • Kevin: Invite GL to search planning meeting(s)
    •   Done
  • David: Will send email to private list requesting BM25 testing; later to public list (and Discovery weekly status)
    •   Done
  • Erik: work w/PO to make sure some time gets allocated to work on cindy problems