Discovery/Status updates/2018-08-13

This is the weekly update for the week starting 2018-08-13



  • We re-indexed wikis in Malay, Indonesian, and Polish, adding stemming to Malay and ICU normalization to Indonesian [1], and fixing the worst pathological stemming errors in Polish [2].
    • Reindexing tickets are [3] and [4].
  • Trey wrote a blog post about the difficulties of tokenization (breaking text up into words, more or less). [5] It's the first in a series covering the basics of search.

Did you know?Edit

  • Auto-antonyms (or contronyms, among other names) are words that have contradictory meanings. [6] Some cases arise from words that were originally distinct, but came to sound the same through normal sound change, such as “cleave”, which means both to hold fast (from Old English “clifian”) and to cut in two (from Old English “clēofan”). Other times, words just acquire novel but contradictory senses, such as “fast”, which means both to be securely attached to, and able to move quickly. Often the sense is quite clear from context, but when it is not, words can become so confusing that careful writers avoid them; these are sometimes called “skunked” words. [7]