Open main menu

Help talk:CirrusSearch

About this board

Colin M (talkcontribs)

The filters section says A namespace or a prefix term is not a filter because a namespace will not run standalone, and a prefix will not negate. This seems empirically untrue. On EnWP I get the following number of results for each of these queries, as expected:

  • incategory:"LGBT-related musical films": 58
  • incategory:"LGBT-related musical films" prefix:"Hello": 2
  • incategory:"LGBT-related musical films" -prefix:"Hello": 56

So it seems like negating a prefix does work. Am I misunderstanding what this is trying to say? For now I added a {{dubious}} tag.

TJones (WMF) (talkcontribs)

I took a long hard look at this and I'm confused, too. I'm not sure I understand the definition of "filter" being used. I don't know if the documentation is out of date or using some model that I'm not able to wrap my head around. Similarly, I don't get this: Insource ... is also a filter, but insource:/regexp/ is not a filter. insource:word and insource:/regex/ behave pretty much the same, other than the regex being much slower. Sounds like the documentation could use a thorough review to make sure all the advanced features and special cases are still described correctly.

Reply to "Prefixes don't negate?"

incategory parameter and white space

5
Summary by Tacsipacsi

Use quotation marks ("New cars") to ignore whitespace in parameter.

Loman87 (talkcontribs)

Hello everybody,

I am noticing an issue when using the incategory parameter, which doesn't work if there is a white space in the query, e.g. incategory:New cars doesn't work; incategory:New_cars works fine instead. This was always ok for me, but now I need to use this parameter with Extension:InputBox to limit search to specific categories. In the call to this categories I also need to use Variables, which give as output values with white space, e.g. {{FULLPAGENAME}} gives something like Category:New cars. Is there any way to make Cirrus Search working also with white spaces in category names? Or also to "force" variables to give values with underscores instead of the white spaces?

I am not sure this the right place to post this question, anyway any help is really appreciated.

Thanks,

Lorenzo

Tacsipacsi (talkcontribs)

Space is the separator between search terms, so incategory:New cars searches for pages mentioning cars in Category:New. You can explicitly mark search term boundaries with quotation marks, i.e. incategory:"New cars".

Loman87 (talkcontribs)

This is a wonderful workaround, thanks very much!

PerfektesChaos (talkcontribs)

Or incategory:New_cars since _ is regular replacement for spaces in page names.

Tacsipacsi (talkcontribs)

But it’s not so easy to convert the output of {{PAGENAME}} (not {{PAGENAMEE}}) to use underscores, and that’s what the question is about.

T506 not clear about tilde position for easy translation - reformulate please

2
Wladek92 (talkcontribs)

In

<!--T:506--> 
A fuzzy-word or fuzzy-phrase search can suffix a tilde ~ character (and a number telling the degree).

We can understand that the fuzzy elements come AFTER the tilde, or we may also guess that the fuzzy element has a tilde as it suffix (...???). More of that the next sentence "A tilde ~ character prefixed to the first term of a query guarantees search results instead of any possible navigation." is the same as the first proposition and makes a repetition. Can somebody reformulate please ? Thanks.

Christian FR (talk) 12:54, 1 November 2019 (UTC)

Ciencia Al Poder (talkcontribs)

I think T:506 refers to "phrase~".

About "A tilde ~ character prefixed to the first term of a query guarantees search results instead of any possible navigation.", I think it means, if you search for "MediaWiki", since a page with that name exists, it will redirect you straight to the page called MediaWiki. Searching for "~MediaWiki" will give you the search results page, even if a page with that name exists.

Reply to "T506 not clear about tilde position for easy translation - reformulate please"

T398 seems bad command for ignore Translations: in inlanguage:

2
Wladek92 (talkcontribs)

Is command T398 correct ? the namespace Translations: should appear in the text but it is not present, then the command is similar to selection of pages in japanese (T396) but we are explaining how to ignore them (strange!). Can someone detail or correct ? . Thanks you.

<!--T:398-->
* to ignore Translate, and where English is the base language, add
</translate>
: <kbd>inlanguage:en</kbd>

<!--T:396-->
* to count all Japanese pages on the wiki
: <kbd>all: inlanguage: ja</kbd

Christian FR (talk) 12:34, 1 November 2019 (UTC)

Ciencia Al Poder (talkcontribs)

I'm not sure about the "ignore Translate" intent, because that seems to be done with the selection of namespaces to search (the Translations: namespace is not marked by default). I'd remove the "ignore Translate" part, and only focus on the language

Reply to "T398 seems bad command for ignore Translations: in inlanguage:"

T549 should be command => "hastemplate: portal:contents/..." instead of ": hastemplate: portal:contents/..."

2
Summary by Wladek92

done, corrected; thanks.

Wladek92 (talkcontribs)

I think first colon should be removed; any advice ???

<!--T:549-->
* <tvar|hastportal><kbd>: hastemplate: portal:contents/tocnavbar</kbd></>, finds mainspace usage of a "<tvar|tocnavbar>Contents/TOCnavbar</>" template in the Portal namespace.
Speravir (talkcontribs)

Yes, this should be a typo mistake.

Question about spelling corrections and "no results"

8
Equinox (talkcontribs)

For example: I put parimion into Wikipedia's search box. It says: "Showing results for pavilion. (LINK:) Search instead for parimion." I click that link and it says: "There were no results matching the query."

The spelling correction is (sometimes) useful, but in my experience, the "search instead" link never ever gives any results. Indeed that link only seems to be offered when your typed text is not present in the entire wiki, and then it does the best-guess spelling for you.

Am I right? If so, what's the point of that "search instead" link, which is guaranteed to produce no results?

TJones (WMF) (talkcontribs)

We do only replace your query with the suggestion if the original query got zero results. I think the "search instead" language pre-dates all of us who are currently working on the search platform team, so I can't give you the original justification for it—though mimicking Google's UI patterns generally makes search more understandable for most users. However, I can imagine that some people—particularly power users and editors of various sorts—would be upset if they searched for parimion, got results for pavilion, and then couldn't verify that parimion did in fact get zero results.

Google will override your intended search with their suggestion and give a link for your original search that gives fewer results. So, we are working in an environment where people might expect valid results to be overridden by a search engine; letting them see their original results even though there will be zero is goofy, but it's goofiness in the name of transparency.

197.235.220.190 (talkcontribs)

Seems rather simple to improve. Make it clear to the user that the query they chose will result in 0 entries, e.g.: "Showing results for pavilion. Search instead for parimion (Note: there are 0 results)".


>Am I right? If so, what's the point of that "search instead" link, which is guaranteed to produce no results?

No.

It is very important to keep the option allowing the user to search instead for whatever they typed. First, it allows them to verify the search engine's claim, it also makes it clear that the user isn't getting wild results because of some bug, and lastly, they can always check that it is accurate. After all, machines can and do make mistakes, and more importantly, the search engine can be wrong more often than not, especially, in a wiki where things can change. At the time of the query the search engine might be right, but just a few seconds later someone can create the page, or the new entry might simply be taking time to update the index despite the fact that the content was created right before your search.


Anyway, if a particular wiki doesn't like the message, I guess they could edit it using Mediawiki:search-rewritten.

Equinox (talkcontribs)

If there are no results then I think it would be better to say "no results for X; here are results for Y", and drop the pointless link. I take "197"'s point that there might be results if you search again a few seconds later, but if you want to do that you can just hit Refresh or F5 etc. Hardly a common use case.

Equinox (talkcontribs)

What is my next step? I have had bad experiences with bug trackers. How can I suggest this change without being shit on? Thanks.

TJones (WMF) (talkcontribs)

I'm sorry that you've had bad experiences with task trackers. It's a recurring problem for a lot of people, unfortunately. In this case, the people who would be working on it agree with you, so there shoudn't be any reason for unpleasant discussion.

I've uncovered some of the history of the message—turns out one person on our team was here when it was implemented—and the original thought was that we might allow suggestions to overwrite queries that got a non-zero number of results, but that never materialized.

The current plan is to create a new message that says there are no results for the original query, which we'll show when appropriate, and keep the existing message for a possible future case where we overwrite a query with non-zero results.

I've created a task: T236296

Equinox (talkcontribs)

Okay. Thanks. I really appreciate your help here as an "insider". Let's see how it goes :)

TJones (WMF) (talkcontribs)

Glad I could help. Please do keep in mind that we have to prioritize and work through lots of tasks, so while this is probably straightforward, it may take a while for us to get to it. But you definitely gave us a helpful push in the right direction. Thanks!

Reply to "Question about spelling corrections and "no results""
Summary last edited by TJones (WMF) 16:46, 29 October 2019 22 days ago

First posting was not trolling, but a question about a misunderstood meaning of a word. Solved by rephrasing of this part.

Zabavuju flašku chlastu maskovanou jako zubní pastu (talkcontribs)

Warning: Do not run a bare <tvar|insreg>insource:/regexp/</> search. It will probably timeout after 20 seconds anyway, while blocking responsible users.

is this a bad joke?

Speravir (talkcontribs)

No, it is a serious warning.

Zabavuju flašku chlastu maskovanou jako zubní pastu (talkcontribs)

Am I missing something? But why should making a search query block someone???

Clump (talkcontribs)

Probably due to some serialization in handling the search request that results in the server being unable to respond to other requests in a timely fashion.

Speravir (talkcontribs)

@Clump, from Help:CirrusSearch#Insource:

Regex scan all the textual characters in a given list of pages; they don't have a word index to speed things up, […]

And in Help:CirrusSearch#Regular expression searches:

A regex search actually scours each page in the search domain character-by character. By contrast, an indexed search actually queries a few records from a database separately maintained from the wiki database, […]
Zabavuju flašku chlastu maskovanou jako zubní pastu (talkcontribs)
Speravir (talkcontribs)

Primarly “to block” is just a verb and has not only this meaning you apparently think of, cf. e.g block - český překlad - slovník bab.la or more verbous Překladač Google (searched this for you). In this case it means the servers are in worst case unreachable for others, so these “responsible users” are blocked from using these servers.

Zabavuju flašku chlastu maskovanou jako zubní pastu (talkcontribs)

Generally speaking most terms with more meanings have a dominant interpretation. You can probably understand that the dominant interpretation of "USA" is not United Scenic Artists or United Soccer Association. I'm sure that in context of Mediawiki the dominant interpretation of the verb "to block" is not "just a verb" but "to remove edit right". (Accordingly the dominant interpretation of "image" is image file and not "A characteristic of a person, group or company etc., style, manner of dress, how one is, or wishes to be, perceived by others." Etc.)

PS: I'm most active at Wiktionary. No need to tell me what a verb means or how to find it.

Speravir (talkcontribs)

Well, my first language is not English, and I am sure I am far from perfect understanding, but I know that for every word we have to look at the context, and I think I understood it right here.

This said a tip for you: Next time you want to point to a potential wrong use of a word here or elsewhere do not caption it with “bad joke”. End of discussion for me.

TJones (WMF) (talkcontribs)

Zabavuju, like Speravir, I didn't interpret "block" in the admin sense, since I'm familiar with how regex searches work—they can be very computationally expensive, so they can tie up the search servers, and only a limited number are allowed to run at once, so even if the servers aren't too busy overall, running very expensive regexe searches can temporarily "block" other users from running more reasonable regex searches.

I took your "bad joke" comment to refer to the somewhat hostile tone of the documentation; I run bare insource regex queries fairly regularly because there's no other way to get the info I need, plus they aren't super expensive on much smaller wikis. I generally try not to get involved in issues of style and tone in the documentation, only technical accuracy, so I didn't get involved in the discussion when it first started. It wasn't until reading over it today that I saw the further replies and now see the issue.

Anyway, as a sometime copyeditor, I think that any text that is potentially confusing should be edited for clarity. It doesn't matter whether 95% or 5% of people are going to misread "block" as an admin action. It's easy enough to use another word. I'll edit it from my volunteer account, since this isn't a technical issue.

Speravir (talkcontribs)

@TJones (WMF)/Trey314159: Thank you for your edit. And again: I think most of the discussion could have been avoided with a different title and first posting in a neutral tone. So, I misunderstood the intention of the thread starter.

185.124.231.251 (talkcontribs)

I install CirrusSearch (0.2), Elastica (1.3.0.0), Elasticsearch (5.6.9) according this instruction Extension:CirrusSearch

After then I try to search and search is working well.

But I create a new page in wiki and search not work. How to update index in ElasticSearch database? Is it auto or I need sometimes run maintence scripts from CirrusSearch for update index?

DCausse (WMF) (talkcontribs)

CirrusSearch uses the jobqueue to index live updates. The jobqueue may be configured in many different ways so it depends on how you configured it. I think the default behavior is to run jobs while there are visits on your wiki using DeferredUpdates. To see if it's because the jobqueue is not properly running jobs, try to run mwscript runJobs.php. It may be other problems in which case you would have to inspect your log files to find an indication of something that is not working well. Unfortunately without more information it's hard for us to help you. Good luck!

24.176.93.67 (talkcontribs)

Manually running the runJobs.php solved this for me. I will need to do some testing to see if new content is getting picked up after manually running this script, and if I will look into debugging why it's not being run automatically.

Reply to "How to update index"
Blinkingline (talkcontribs)

I have all of the relevant bits for CirrusSearch installed, CirrusSearch (0.2), Elastica (6.0.2), and ElasticSearch (7.4.0).


When I try to build my index, I get an error that the metastore isn't built. When I try to run php metastore, I get the following error/trace:

mw_cirrus_metastore missing, creating new metastore index.

Creating metastore index... mw_cirrus_metastore_firstScanning available plugins...none

[903244e47ec001068374178e] [no req]   Elastica\Exception\ResponseException from line 181 of /var/www/html/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Transport/Http.php: Root mapping definition has unsupported parameters:  [mw_cirrus_metastore : {dynamic=false, properties={mediawiki_version={type=keyword}, mapping_min={type=long}, analysis_maj={type=long}, cirrus_commit={type=keyword}, mapping_maj={type=long}, wiki={type=keyword}, shard_count={type=long}, type={type=keyword}, index_name={type=keyword}, mediawiki_commit={type=keyword}, analysis_min={type=long}, namespace_name={norms=false, analyzer=near_match_asciifolding, type=text, index_options=docs}}}] [reason: Failed to parse mapping [_doc]: Root mapping definition has unsupported parameters:  [mw_cirrus_metastore : {dynamic=false, properties={mediawiki_version={type=keyword}, mapping_min={type=long}, analysis_maj={type=long}, cirrus_commit={type=keyword}, mapping_maj={type=long}, wiki={type=keyword}, shard_count={type=long}, type={type=keyword}, index_name={type=keyword}, mediawiki_commit={type=keyword}, analysis_min={type=long}, namespace_name={norms=false, analyzer=near_match_asciifolding, type=text, index_options=docs}}}]]

Backtrace:

#0 /var/www/html/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Request.php(193): Elastica\Transport\Http->exec(Elastica\Request, array)

#1 /var/www/html/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Client.php(688): Elastica\Request->send()

#2 /var/www/html/mediawiki/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Index.php(559): Elastica\Client->request(string, string, array, array)

#3 /var/www/html/mediawiki/extensions/CirrusSearch/includes/MetaStore/MetaStoreIndex.php(237): Elastica\Index->request(string, string, array, array)

#4 /var/www/html/mediawiki/extensions/CirrusSearch/includes/MetaStore/MetaStoreIndex.php(169): CirrusSearch\MetaStore\MetaStoreIndex->createNewIndex()

#5 /var/www/html/mediawiki/extensions/CirrusSearch/includes/MetaStore/MetaStoreIndex.php(176): CirrusSearch\MetaStore\MetaStoreIndex->createIfNecessary()

#6 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/metastore.php(71): CirrusSearch\MetaStore\MetaStoreIndex->createOrUpgradeIfNecessary()

#7 /var/www/html/mediawiki/maintenance/doMaintenance.php(96): CirrusSearch\Maintenance\Metastore->execute()

#8 /var/www/html/mediawiki/extensions/CirrusSearch/maintenance/metastore.php(166): require_once(string)

#9 {main}


Any suggestions on how to proceed?

EBernhardson (WMF) (talkcontribs)

You will need to downgrade elasticsearch to 6.x, 7.x is not yet supported. The version of elasticsearch we use and which is tested best is 6.5.4.

Reply to "Trouble installing CirrusSearch"

Can't visit result if page title is a Unicode combining form

7
Equinox (talkcontribs)

For example: The page title is a Unicode combining form. In the latest Google Chrome (77.0.3865.90) on Windows 7, and probably other browsers and platforms, the link to the result page cannot be clicked; I had to view the HTML source to be able to follow the link. This is unfortunate.

TJones (WMF) (talkcontribs)

Good example! I ran into this once before, but forgot about it until now. I did a quick test on this one example, and adding a no-break space before and after the combining character makes it clickable (Chrome on a Mac) and makes it look a little better. I'll open a ticket to fix this.

TJones (WMF) (talkcontribs)

This is a problem for more than just search results, so I'm working on a more general ticket.

TJones (WMF) (talkcontribs)

I've created T233840 and found some other examples. This should probably be fixed in the low-level code that generates links, and not ad hoc in our code for title links on the search results page. I'm not sure what project tag to add over there, but that should get sorted out.

86.155.78.191 (talkcontribs)

Thank you for paying attention. I am a big terrible child on wikt but it's generally hard to get eyeballs on problems that we find. This seems like one of those "safe display" questions like converting everything to %xx before you render it to a Web page, so that you don't accidentally write HTML tags. Technically it's correct to render the Unicode combining form "as is" but Wiktionary is a bit strange in having these things as page titles (where Wikipedia never would). Let's see...

Erutuon (talkcontribs)

It's variable for me. For me this character (U+0342 COMBINING GREEK PERISPOMENI) is unclickable, but Equinox's character (U+0489 COMBINING CYRILLIC MILLIONS SIGN) has a narrow "hitbox" way over on the right of the displayed character that I can click. Either way it's not good.

TJones (WMF) (talkcontribs)

@86.155.78.191—Wikipedia does have this problem, too! It's not on the search page, but ҉ is a redirect to "Cyrillic numerals" and the combining form of ` redirects to "Grave accent". Where they say "Redirected from ..." the redirect titles are unclickable and hard to read. There are similar problems on the "what links here" page with those redirects. That's why I pushed for a lower-level fix in the Phab ticket.

@Erutuon, it does seem to vary by character and browser (I'd bet it varies by OS, too), but even when they are clickable, it shouldn't be like playing some puzzle game where you are trying to find that one clickable pixel!

If you have Phab accounts, please consider subscribing to the ticket and adding your comments over there, too.


Reply to "Can't visit result if page title is a Unicode combining form"
Return to "CirrusSearch" page.