Hi Trey, thanks for your recent blog post – it's a good overview of many of the challenges I encounter in multilingual text processing as a software engineer, not only in search.
Since you mentioned Vietnamese, I'd like to call your attention to phab:T78485: if the user enters search terms that contain diacritics, especially tone marks, MediaWiki should not direct the user to any other article title that matches only the base letters but not the diacritics. This is important because Vietnamese words pack a lot of meaning into diacritical marks. There are a great many minimal pairs of 6–12 three-character-long words that differ only by diacritics, especially in proper names.
You have a point that we can't rely on users to always enter all the diacritical marks. But if they enter any diacritical marks, they expect those particular diacritical marks to be respected for the most part. The impact of diacritic-folding already marked text is similar to redirecting a query for "résumé" to "resume" in English. Sometimes users enter the wrong diacritics, especially when using the VNI input method or the "VIQR" keyboard on iOS, which both place all the diacritic keys next to each other. But such mistakes can be counted less than a base-character difference when calculating edit distance; these typos don't necessarily require diacritic folding.
What's more, Vietnamese organizes the marks into two tiers: one tier (such as circumflexes) is considered part of the base letter, while another tier of tone marks is considered separate from the base letter. Traditionally, the tone marks apply to the word as a whole. Analytics bear out the fact that, in an autocompleting textbox, users commonly enter some diacritics while omitting the tone marks until after they spell out the whole word. So if anything, Vietnamese queries should be evaluated three times: first literally, then after folding the tone marks in the target text, then after folding all diacritics in the target text. But diacritics in the query should never be folded automatically.