Topic on Talk:Search/Old/status

'.' considered a word character?

7
Junkyardsparkle (talkcontribs)

When using the following query on commons:

incategory:California_Historical_Society_Collection,_1860-1960 intitle:restoration.jpg

One result is found, but if "restoration.jpg" is truncated to "restoration" (as would normally be the case when searching for that term) no results are returned. This is highly problematic for title searches on commons, where most page titles include file extensions. Possibly related to this this bug?

NEverett (WMF) (talkcontribs)

I believe this will be fixed by the fix for bugzilla:63861. That should hit commons tomorrow and I'll rebuild the index and see if that fixes it.

Junkyardsparkle (talkcontribs)

Great. In general, the new search works so well that I forget that it's an opt-in beta... in particular, it's nice for creating fairly tight heuristically-defined lists of files for acting on with cat-a-lot on commons. Most of the outliers are conveniently sorted to the end of the listing for easy unselecting. I have bumped my head on (apparently) some query complexity limits while doing crazy things, but otherwise it's a very powerful tool for more than just landing a casual user on the right article page... cheers to everybody working on it. :)

NEverett (WMF) (talkcontribs)

Thanks! Its just User:Demon and I working on it but we're leaning on Elasticsearch and Lucene which are pretty powerful. Would you mind posting examples of some of the neat queries that work for you? I'll add them to the regression test suite.

I did try to rebuild commons yesterday but bumped up against a timeout error during one of the rebuild steps an hour and a half into the process. I built a fix this morning and I'll try to squeeze it out to production today and try again.

NEverett (WMF) (talkcontribs)
Junkyardsparkle (talkcontribs)

Looks good, I'll get back to categorizing now... don't know if my actually used queries would be useful for regression testing, because they tend to obsolete themselves after I act on the results, at least as far as the (possibly negated) "incategory:" terms go, which is what I tend to end up with a lot of... but if I can abstract a good test that reflects my use case, I'll mention it here. Thanks again.

Junkyardsparkle (talkcontribs)

As of right now, things have regressed to this being a problem again. Example case, this query no longer finds this file. Hope those new servers help enough to put the fix back in place. :)

Reply to "'.' considered a word character?"