Discovery/Status updates/2018-04-09

This is the weekly update for the week starting 2018-04-09



Events and NewsEdit

  • Erik and Trey went to the OpenSource Connections Haystack Search Relevance Conference and Tom Tom Founders Festival Machine Learning Conference, which were back-to-back in Charlottesville, VA. Erik presented on how we use clickstream information to create training data for our learning to rank models at Haystack. [1] Trey wrote up trip notes—with lots of links—on MediaWiki. [2]

Other Noteworthy StuffEdit

  • Fix for CirrusSearchCheckerJob errors rolled out.[3]
  • Stas implemented indexing Lexemes & Forms for WikibaseLexeme extension. [4]

Did you know?Edit

  • The English verb "to be" is kind of weird—the infinitive "be" and participles "being, been" start with "b-", while the preterite forms "was, were" start with "w-", and the present forms "am, is, are" start with vowels. The conjugations originally come from three or four different verbs! Why "three or four"? Wiktionary disagrees with itself a bit, listing four on the etymology of "is" [5] and three on the etymology of "be". [6] The conflation goes back at least to Proto-Germanic, [7] so German is similarly weird. [8] Dutch has a greatly simplified paradigm, but still shows some trace of the multiple sources. [9] Other languages, including ASL, Arabic, Bengali, Hawaiian, Hebrew, Indonesian, Japanese, Russian, Turkish, and Ukrainian at least partly avoid this mess by having a zero copula. [10] For search on-wiki, we deal with this problem in part with stemming [11] and stop words. [12]