Topic on Talk:Search/Old

SchreiberBike (talkcontribs)

I mostly find small problems in English Wikipedia and fix them. I have developed a variety of searches which find such errors. An example would be at this search for the ordinal error 1th which works with the old search, but not when the new search is turned on. I have a long series of similar searches set up at w:en:User:SchreiberBike/Workspace/Ordinals. I haven't found ways to do this in the new search. I think it is likely that I will have to rewrite my many queries, but right now I'm not even sure such things will be possible. Please suggest how to move forward.

NEverett (WMF) (talkcontribs)

I did some poking and think I found the root cause: -"quoted phrase" is being misinterpreted. I've filed that as Bugzilla:70301 and will work on it soon.

NEverett (WMF) (talkcontribs)

I proposed a fix for what I think caused your issue. In all likelyhood it'll head to wikipedias next Thursday. You can work around it by replacing the - before the quotes with NOT. This won't be required after next Thursday. In addition the <<-广声法师>> clause won't work until this Thursday - there is a fix for that heading out as well.

NEverett (WMF) (talkcontribs)

And another thing: thanks for doing this work. Its something I really want to make sure we don't totally bust with Cirrus. I don't imagine it'll be a perfect switch but I want to make sure nothing becomes impossible.

SchreiberBike (talkcontribs)

Thanks for looking at it. I'll give it another try at the end of next week and let you know how it works.

SchreiberBike (talkcontribs)

I've tried turning New Search back on and the results I get look about the same as they did when I first posted above.

NEverett (WMF) (talkcontribs)

OK! I found another problem with Cirrus after trying your query again. It was kind of being masked by your first problem. I've proposed a fix for it (https://gerrit.wikimedia.org/r/#/c/161474/) but haven't yet got it reviewed. Once its reviewed I'll try to get it released quickly and verify your query works. It might not be Monday but it should be soon.

NEverett (WMF) (talkcontribs)

Please have a look now!

SchreiberBike (talkcontribs)

This is great. It looks like I think it's supposed to look. I'll play with it some more and see if I can turn up any problems. Also, I'm getting more results with Cirrus than with the old search; that's good. The new search also takes the portions in quotes more literally than the old e.g. a search for "1th" only returns articles with that exact string included. Thanks!

NEverett (WMF) (talkcontribs)

Awesome! Thanks for finding and reporting this. I'm glad its working well for you now!

SchreiberBike (talkcontribs)

I've been playing with Cirrus search in English Wikipedia for a bit and one difference I'm finding, which is significant for the kind of work I do, is that it doesn't see the hidden text of references. For instance, if I run a search for "XIth century -368vebleninstinct" (not in quotes) in the old system, the article w:en:Workmanship does not come up because one of the references in that article has the string 368vebleninstinct in a web address. However when I run that with the new search, it does come up. That also means that I can't search for other articles which use that same reference (although there may be a special search for that). If that's a deliberate choice, I can adjust my queries to not use the hidden text of references as exclusion criteria, but I'd rather not.

I've also noticed that the new search does not pick up the comment text of {{clarify}} templates in the form {{Clarify|date=October 2014|reason=Should this be '1st', '11th' or something else?}}. Again, that's something I can work around if needed, but if I don't have to that would be better.

Another possible problem: With the new search, a search for "XIXth century" (not in quotes) has w:en:Provisional Government of the French Republic in its results, but that article doesn't have the word century in it. Same for w:en:German military administration in occupied France during World War II. On the other hand, some articles, such as w:en:Cathar castles in a search for "XIth century", have come up in the new search that never did in the old search.

NEverett (WMF) (talkcontribs)

Thanks for trying it!

By default Cirrus tries hard to search on visible text to make results make more sense for casual readers. So not picking up hidden text in templates is totally intentional. It has a syntax to search in the article's source though. Searching for <<XIth century -insource:368vebleninstinct>> doesn't pick up w:en:Workmanship like lsearchd did with your old search. Is that a decent work around?

I can explain w:en:Provisional Government of the French Republic as well. It does contain the word century but its hidden and Cirrus doesn't properly remove the text. Its in the navbox at the bottom of the page which you can explode by clicking "French Topics". I've filed this as bugzilla:71562. I figured out what was up by adding ?action=cirrusdump to the page and searching for the word "century" in the result. Its in the "auxiliary_text" field which is usually stuff like image captions and tables. In this case the navbox snuck in.

SchreiberBike (talkcontribs)

I've been using the new search for a while and have liked it, especially the fact that it doesn't see things in the source text like linked URLs, but I ran into a possible issue with this search on English Wikipedia for example. It looks for "11st" which is usually an error intended to be be "11th". In the search above, it turns up places where one line ends in a "1" and the next line starts with a "1st". I don't think that is what is intended, so I thought I'd bring it to your attention.

SchreiberBike (talkcontribs)

Also, will "intitle" searches be available in new search. They don't seem to work now. Thanks.

NEverett (WMF) (talkcontribs)

Filed the 11st issue here: https://bugzilla.wikimedia.org/show_bug.cgi?id=73558

I think its caused by how we squash the extra text. You can work around it for now by searching for insource:/ 11st /. Its not as good because it wants spaces only rather than word breaks. You could also try insource:11st. Its _probably_ not as effected by the bug.

Also, can you give me an example of intitle not working? Its working for me: https://en.wikipedia.org/w/index.php?title=Special%3ASearch&profile=advanced&search=intitle%3Atelecom&fulltext=Search&ns0=1&profile=advanced

SchreiberBike (talkcontribs)

I don't remember what I was working on when I ran into the problem, but I think this is similar. Even with "Falklands" in quotes, it returns items without that string in the title. I get similar results for "Thrushes". Thanks

Reply to "Tools for WikiGnomes"