Given Wikimedia's huge corpora, a lot of trash can be excluded by simply excluding all unsuccessful queries that don't :
- Have more than 95% fuzzy match to an existing article title
- Have more than 95% match to a token in the search index
- Have at least one token of X length that is an exact or close match to an existing article
Then to prevent possible exposure of private data only show those tokens , at best there will only be a few letters added, and given the amount of queries that will hardly be enough to identify a single person. Alternatively, just expose the titles which are close match to the search tokens.
It is also a good idea to turn this around. Top search queries that match existing articles are very useful to both editors and readers. For instance, as an editor one would be more interested in improving some stub article that a lot of people search for, or potentially delete it, if it is being used as some hoax or whatever. It can also expose hot spots areas (or categories) that a lot of people try to find more information.
As a reader, knowing that people who searched for X also searched for Y is very useful, as that may help me discover more information about the subject that was harder to find the right keyword to search for.
Page views can easily be distorted by bots, and so can search terms, but combined they are greater than the sum of their parts.