Discovery/Status updates/2017-01-16

This is the weekly update for the week starting 2017-01-16


  • Finalized the second BM25 testing analysis and linked to the pdf here.
  • Migrated Phan for CirrusSearch to Jenkins. (phab:T153040) (technical debt)
  • Finished writing up, summarizing, and recommending extensive changes to TextCat for language identification. Overall improvement to F0.5 accuracy was a mean of just under 5% across the corpora from nine Wikipedias. The two worst performing corpora, from enwiki and nlwiki, each went up around 10%! All nine are now above 90% F0.5 score. Next step is to deploy the recommended changes.
  • Completed (a round of) refactoring and cleanup of Special:Search code (task T150217, task T150390).