Wikimedia Discovery/Meetings/Search retrospective 2016-08-19

Format

This retrospective was conducted using the "Five finger retrospective" format: https://www.mediawiki.org/wiki/Team_Practices_Group/Five_finger_retrospective

Chris: post the link to his "what technical collaboration team does" presentation
- Done
Chris: Chris and Erik should talk about implications of interwiki search indices
- Done
Trey & Deb: Chris needs to be aware of the ? at the end of queries
- Done, and material has been posted on several village pumps
  - https://meta.wikimedia.org/wiki/Discovery/Handling_question_marks_in_search_queries & talk page
Erik: Figure out a plan to reliably monitor github (David and Guillaume have started to watch it)
- Done, decided that volume is low enough that watching projects and getting emails from github is sufficient
- Got Discovery members admin rights on github projects

Deployed "? stripping" in queries
Setup relforge cluster
Deployed textcat
Elasticsearch upgrade to 2.3.4 in progress
Discovery got a new analyst which will help a lot going forward, especially if we start building that probabilistic bot classifier thingy :P
Deployed logstash/kibana upgrade
Completed refactoring of the search fields
WDQS servers for codfw approved and on their way
Searcher class in cirrus is now < 1000 lines +1+1:D
Did research on the top 100-ish unsuccessful queries and decided not to go further with it due to lack of interesting data
Analysis of ascii folding and stemming

We seem to have proven to many people's satisfaction that zero-results queries are not a good place to mine articles and redirects.
Trey's "?" blog post!
Blog post on textcat deployment+1
Addressed some technical debt and made code look saner+1
Trey's analysis of providing a list of search queries and the communication of the results
RelForge cluster has come into existence! And David has been able to index lots of data (enwiki & frwiki in two different ways!)

(Things have generally been running smoothly. It doesn't feel like any ONE thing stands out.)+1+1

elasticsearch not stable
- 2 major issues in the last month and a half. One is a mystery; one we understand and have some ideas for fixes
logstash upgrade delayed multiple times due to lack of preparation / thoroughness
seemingly not enough time in the day +1+1
- Just a lot going on; everything takes time. Last couple weeks have been atypically busy.
- KH: Try not to put in extra hours, generally. Time-sensitive occasional things are understandable.
internet connections have been a bit weird / dropping at inconvinient times ( I spellz gud ) +1 (hangouts have been dodgy)
cindy (automated browser testing) was acting up again, and we're still not sure why or have final fix
- Recurring item. Do we want to think about shifting testing to unit test level? Or to php level?
- We really do need this to work; feel much more comfortable merging when automated tests are working
- Devs should work w/PO to make sure some time gets allocated to work on this

Good communication about "search across projects and across languages"
Trey says: working with Deb & Chris to get blog posts out about developments has been great. Thanks to Deb for driving the process! +1(yay, thanks!)
Doing some good work with Graphs team to make visualizations easier (e.g. integrating w/WDQS)
Guillaume still split between multiple sub teams, no one is complaining...(I feel ya!)+1 - he's doing great!
- Seems to be improving since the last retrospective
- We have multiple sub-teams that are fairly independent

Elasticsearch garden is not cultivated as much as it should (T109089) - for example: the multiple alerts when cluster is failing was there for a fairly long time, but we had that spam again today
- Similar issue with maps: Small issues that are not critical; only get attention when they break. Could do better with that.
- There are a lot of little issues, so it makes sense to prioritize them
- Do you (GL) have knowledge/support to be able to prioritize your work?
  - GL: Would be interested in participating in a planning session
Some work ongoing with a new recommendation system that may need some help from cirrus developers (https://phabricator.wikimedia.org/T143197 )
- Offline article recommendation system (similar to "more like")
Some help needed to catch obvious problems with bm25
- DC: Have created a place to test enwiki on BM25. (http://en-wp-bm25-relforge.wmflabs.org/wiki/Special:Search )
Stas working on upcoming lectures / demos
- Internal talk about SPARQL and WDQS, mostly technical, partly aimed at analysis folks
- Wiki conference San Diego: Less technical audience

Kevin: Invite GL to search planning meeting(s)
- Done
David: Will send email to private list requesting BM25 testing; later to public list (and Discovery weekly status)
- Done
Erik: work w/PO to make sure some time gets allocated to work on cindy problems