Extension talk:CirrusSearch

About this board

Discussion related to the CirrusSearch MediaWiki extension.

See also the open tasks for CirrusSearch on phabricator.

Darenwelsh (talkcontribs)

We are slowly accruing rows in the job table (for several wikis in a farm) where job_attempts=1 and they all seem to be related to CirrusSearch. They have values for job_cmd of "cirrusSearchLinksUpdate", cirrusSearchLinksUpdatePrioritized", and cirrusSearchIncomingLinkCount".

I searched Phabricator and found T121560 but I don't think this is the same issue. If I run showJobs.php, the output indicates no jobs exist.

CKoerner (WMF) (talkcontribs)
Darenwelsh (talkcontribs)

Can you guide me on which logs would be helpful?

Revansx (talkcontribs)

Whatever became of this? .. We're seeing the same thing. Thanks!

cirrusSearchLinksUpdate jobs are getting stuck, but only for pages with subpages!

Reply to "Failed/Stuck jobs"

php UpdateSuggesterIndex.php quitting...

2
Summary by DCausse (WMF)

config $wgCirrusSearchUseCompletionSuggester must be set to 'yes' first.

Pooja2425 (talkcontribs)

Hi ,

When I try to run php UpdateSuggesterIndex.php  getting Completion suggester disabled. Hence unable to get auto suggest in search engine.

Iam using these Elastica 6.3.1 and Cirrus Search 6.5.4

Product Version
MediaWiki 1.35.1
PHP 7.4.15 (apache2handler)
MySQL 8.0.25
Elasticsearch 6.5.4
Lua 5.1.5

pls suggest

DCausse (WMF) (talkcontribs)

Please see docs/settings.txt, the completion suggester must be enabled with the config variable:

$wgCirrusSearchUseCompletionSuggester = 'yes'

Only minimal indexing, most pages are not indexed, almost as ForceSearchIndex.php isn't populating

3
Matija.pu (talkcontribs)

Hi, I have same problem that was already posted in talks and I have tried every possible solution that was presented here:

Product Version
MediaWiki 1.36.0
PHP 7.3.19 (cgi-fcgi)
MariaDB 10.5.11-MariaDB
ICU 64.2
Elasticsearch 6.5.4
CirrusSearch 6.5.4 (264629b)
Elastica 6.1.3 (9f6e66a)


I have tried Kibana and eventualy used Elasticsearch Head Chrome extension to find out state of indexes in elasticsearch. I have used php Saneitize.php to find that pages are not indexed but it was obvious because of 150 pages only 15 pages are indexed.

I did noticed something interesting for this problem analysis but in the end I wasn't able to catch what is happening. After standard set od statements for populating elastic with mediawiki pages

Step 0. -> $wgDisableSearchUpdate = true

Step 1. -> php UpdateSearchIndexConfig.php

Step 2. -> #$wgDisableSearchUpdate = true

Step 3. -> php ForceSearchIndex.php --skipLinks --indexOnSkip

Step 4. -> php ForceSearchIndex.php --skipParse

and restarting elasticsearch service, after some time, few (2-4, usually less than 10) pages would become additionaly indexed.

So, initialy zero, than after some steps with restarting 4 pages, than after some steps 6 pages, than after some steps 15, etc. and than I would not be able to repeat. Strange.

In the end I was not able to find pattern or if it is some elastic memory cache sharding problem or some error in cirrus sending pages for indexing. I was not able to catch any php error and pages were randomly choosen for indexing.


Any suggestion? Tnx

DCausse (WMF) (talkcontribs)

I would suggest to check the elasticsearch logs but restarting elastic should not have any impact on the number of indexed pages (elasticsearch has no cache that could explain what you see, only refresh_interval of the index set to a high value could explain this but it's set to a low value by default by CirrusSearch).

The behavior you describe suggests that it is a JobQueue issue. Please see Manual:Job_queue and check that it is properly setup.

Matija.pu (talkcontribs)

Yes! This was JobQueue issue. With php runJobs.php I did get for every page and file to become indexed in elasticsearch.


Tnx!

Cirrus search not showing auto suggest/auto complete in search engine

7
Pooja2425 (talkcontribs)

Hi Using these,

MediaWiki 1.35.1
PHP 7.4.15 (apache2handler)
MySQL 8.0.25
Elasticsearch 6.5.4

i have integarted cirrus search with elastic search , Now ima getting results if i hit enter on the search engine.


Search engine not giving me auto suggestion when typing into search engine,

showing error in console

{"error":{"code":"toomanyvalues","info":"Too many values supplied for parameter \"namespace\".

The limit is 50.","limit":50,"lowlimit":50,"highlimit":500,

Pls suggest..

Ciencia Al Poder (talkcontribs)

This seems a reply to an API query. Can you post here the api request URL, or at least the parameters?

Pooja2425 (talkcontribs)

Request URL: http://wiki/api.php?action=opensearch&format=json&formatversion=2&search=test&namespace=0%7C2%7C4%7C12%7C202%7C208%7C300%7C302%7C304%7C306%7C308%7C310%7C312%7C314%7C316%7C318%7C320%7C322%7C324%7C326%7C328%7C330%7C332%7C334%7C336%7C338%7C340%7C342%7C346%7C348%7C350%7C352%7C354%7C356%7C358%7C360%7C364%7C368%7C370%7C372%7C374%7C378%7C380%7C382%7C384%7C386%7C388%7C390%7C392%7C396%7C400%7C402%7C404%7C406%7C408%7C410%7C418%7C422%7C424%7C428%7C434%7C436%7C440%7C442%7C444%7C448%7C450%7C452%7C454%7C456%7C458%7C460%7C462%7C464%7C466%7C468%7C470%7C472%7C474%7C476&limit=10

Pooja2425 (talkcontribs)

Actually iam getting desired results after hitting enter into search box, no auto suggestion are there.

pls help

Ciencia Al Poder (talkcontribs)

The list of namespaces that display that query is: 0|2|4|12|202|208|300|302|304|306|308|310|312|314|316|318|320|322|324|326|328|330|332|334|336|338|340|342|346|348|350|352|354|356|358|360|364|368|370|372|374|378|380|382|384|386|388|390|392|396|400|402|404|406|408|410|418|422|424|428|434|436|440|442|444|448|450|452|454|456|458|460|462|464|466|468|470|472|474|476

You have a list of 80 namespaces to search :O That's not supported.

This post was hidden by Pooja2425 (history)
Pooja2425 (talkcontribs)

Hi @Ciencia Al Poder

How do I limit these namespaces of Search :O

When i try to type anything in my search box. Automatically iam getting below in request. How to limit these..ie. 10 0|2|4|12|202|208|300|302|304|306|308|310|312|314|316|318|320|322|324|326|328|330|332|334|336|338|340|342|346|348|350|352|354|356|358|360|364|368|370|372|374|378|380|382|384|386|388|390|392|396|400|402|404|406|408|410|418|422|424|428|434|436|440|442|444|448|450|452|454|456|458|460|462|464|466|468|470|472|474|476

Reply to "Cirrus search not showing auto suggest/auto complete in search engine"

Cirrus Search not giving auto suggestion/complete when typing into search bar

4
Summary by Ciencia Al Poder

Duplicate of Topic:Wbwh27c09uzyiq5s

Pooja2425 (talkcontribs)

Hi I have integrated the elastic search server with Cirrus search. Its giving me results but auto suggest is not working, Result came after hitting enter .

Request URL of Cirrus search: http://wiki/api.php?action=opensearch&format=json&formatversion=2&search=test&namespace=0%7C2%7C4%7C12%7C202%7C208%7C300%7C302%7C304%7C306%7C308%7C310%7C312%7C314%7C316%7C318%7C320%7C322%7C324%7C326%7C328%7C330%7C332%7C334%7C336%7C338%7C340%7C342%7C346%7C348%7C350%7C352%7C354%7C356%7C358%7C360%7C364%7C368%7C370%7C372%7C374%7C378%7C380%7C382%7C384%7C386%7C388%7C390%7C392%7C396%7C400%7C402%7C404%7C406%7C408%7C410%7C418%7C422%7C424%7C428%7C434%7C436%7C440%7C442%7C444%7C448%7C450%7C452%7C454%7C456%7C458%7C460%7C462%7C464%7C466%7C468%7C470%7C472%7C474%7C476&limit=10


Request URL of Sphinx Search :

https://wikidev.equinor.com/wiki/api.php?action=opensearch&format=json&formatversion=2&search=rrr&namespace=0%7C2%7C4%7C12%7C202%7C208%7C300%7C302%7C304%7C306%7C308%7C310%7C312%7C314%7C316%7C318%7C320%7C322%7C324%7C326%7C328%7C330%7C332%7C334%7C336%7C338%7C340%7C342%7C346%7C348%7C350%7C352%7C354%7C356%7C358%7C360%7C364%7C368%7C370%7C372%7C374%7C378%7C380%7C382%7C384%7C386%7C388%7C390%7C392%7C396%7C400%7C402%7C404%7C406%7C408%7C410%7C418%7C422%7C424%7C428%7C434%7C436%7C440%7C442%7C444%7C448%7C450%7C452%7C454%7C456%7C458%7C460%7C462%7C464%7C466%7C468%7C470%7C472%7C474%7C476&limit=10&suggest=true

SInce iam not getting suggest=true into Cirrus Search.


Pls suggest, I need auto suggestion/complete help when typing into Cirrus Search Bar.

Ciencia Al Poder (talkcontribs)

What do you get on error console?

Pooja2425 (talkcontribs)

{"error":{"code":"toomanyvalues","info":"Too many values supplied for parameter \"namespace\". The limit is 50.","limit":50,"lowlimit":50,"highlimit":500,"docref":"See http://wikipoc.equinor.com/wiki135/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/mailman/listinfo/mediawiki-api-announce> for notice of API deprecations and breaking changes."}}

I am getting response, but not getting auto complete /auto suggest when typing into search engine as we are getting in default wiki search engine and sphinex search integrated search engine.


pls suggest , if i have to reduce the limit then how to reduce limit of my request.

Ciencia Al Poder (talkcontribs)

Please don't open more than one thread for the same problem

"Index is unknown retrying..." error on index generation script

6
Summary by MyWikis-JeffreyWang

Don't use AWS Elasticsearch or this happens

MyWikis-JeffreyWang (talkcontribs)

When attempting to run php /var/www/mediawiki-1.35.2/w/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php, I'm getting the following output:

indexing namespaces...

mw_cirrus_metastore missing, creating new metastore index.

Creating metastore index... mw_cirrus_metastore_first Scanning available plugins...

none

ok

Index is unknown retrying...

I'm not sure how the index could be unknown when this is the script that's supposed to generate it. Here's my settings:

$wgCirrusSearchIndexBaseName = $wgDBname;

$wgDisableSearchUpdate = true;

$wgCirrusSearchClusters = [

    'default' => [

        [

            'host' => 'search-cirrus-randomcharsid.us-east-1.es.amazonaws.com',

            'port' => 443,

            'scheme' => 'https'

        ]

    ]

];

Any ideas on why this might be the case? Thanks in advance!

EBernhardson (WMF) (talkcontribs)

The message `Index is unknown retrying...` comes from the code that is waiting for the new metastore index to report it is fully created and healthy. While not particularly clear, it seems this means the request to create the index was submitted, but then the later requests to ask about the status of this new index are reporting no such index currently exists. Critically it appears no check is being performed against the index creation request response, it seems likely the index creation request is failing and then Cirrus bails when checking that index.

Cirrus will need to be adjusted to do a better job at error reporting here, but that will only improve reporting it wouldn't fix your actual problem. My first guess would be, which version of elasticsearch are you using? I'm pretty sure the existing metastore index would be rejected by elastic >= 7.0.

MyWikis-JeffreyWang (talkcontribs)

Hi @EBernhardson (WMF), thanks for your reply! I am using Elasticsearch 6.5.4 on AWS's managed version of Elasticsearch, and using MediaWiki 1.35 (and CirrusSearch and Elastica are both on the REL1_35 branch of Git).

If it helps, I have "elasticsearch/elasticsearch": "6.7.2" in my composer.local.json because of T276854. So I guess I'm already laughing a bit at myself for picking two different versions. I'm using 6.5.4 because of the recommendation on the CirrusSearch extension page, but I'm wondering if it's safe to go up to 6.7 or 6.8.

Ciencia Al Poder (talkcontribs)

MediaWiki 1.33.x to 1.36.x require Elasticsearch 6.5.x (6.5.4 recommended).

Any other version (even newer ones than the requirement) will report ES is not compatible and fail instantly. (or at least that was happening some versions ago)

MyWikis-JeffreyWang (talkcontribs)

@Ciencia Al Poder The problem is that when I don't add that line into my composer.local.json I get the issue reported at T276854.

MyWikis-JeffreyWang (talkcontribs)

Elasticsearch server version can't be found

2
Summary by MyWikis-JeffreyWang

AWS Elasticsearch won't work with CirrusSearch and shouldn't be used.

MyWikis-JeffreyWang (talkcontribs)

For some reason, the Elasticsearch server version isn't displaying.

Product Version
MediaWiki 1.35.3
PHP 7.4.16 (fpm-fcgi)
MariaDB 10.3.27-MariaDB-0+deb10u1
ICU 65.1
Lua 5.1.5
Elasticsearch

Running UpdateOneSearchIndexConfig.php causes:

php maintenance/UpdateOneSearchIndexConfig.php --indexType general --wiki internal

Fetching Elasticsearch version...unable to determine, aborting.

cURL returns:

{ "name" : "OJSpnVJ", "cluster_name" : "626799937287:mywikis-cirrus-dev", "cluster_uuid" : "E_kkBLrERAu0e_3k7okq1w", "version" : { "number" : "6.8.0", "build_flavor" : "oss", "build_type" : "zip", "build_hash" : "8169a24", "build_date" : "2021-04-21T19:26:55.782637Z", "build_snapshot" : false, "lucene_version" : "7.7.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }

Quite odd, any ideas?

MyWikis-JeffreyWang (talkcontribs)

Looks like AWS Elasticsearch 6.8 sucks! Don't use it.

Reply to "Elasticsearch server version can't be found"

UpdateSearchIndexConfig.php finishes running almost immediately

3
Summary by MyWikis-JeffreyWang

Turn on error reporting

MyWikis-JeffreyWang (talkcontribs)

I seem to continue having trouble with UpdateSearchIndexConfig.php, so I tried running it with the --startOver flag once. After this happened, I tried running UpdateSearchIndexConfig.php with and without this flag, and all I am getting now is this:

$ php /var/www/mediawiki/w/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php

indexing namespaces...

And then the script finishes executing 0.1 seconds after this is printed out. What's the reason behind this early termination? I have errors turned on, so there isn't anything coming from.

Per what other people have done, I'm using Elasticsearch 6.8. (Please don't tell me to use 6.5.4, I've seen people using 6.8 and it works, I just don't know why it's so difficult to set it up on AWS Elasticsearch as opposed to a self-hosted solution.)

Ciencia Al Poder (talkcontribs)

Enable PHP error reporting. It looks like the script is finishing with an error, but error reporting is disabled and no error message is printed at all.

MyWikis-JeffreyWang (talkcontribs)

Hmm, I already said "I have errors turned on", but I guess you knew I didn't. Appreciate the reminder. There are so many moving parts to this, it's hard to even set up a simple MVP of it.

Search results empty, nothing seems to have been indexed

3
Summary by DCausse (WMF)

JobQueue related (CirrusSearch updates require the JobQueue to be setup properly)

AudentioJakeB (talkcontribs)

MediaWiki: 1.35.2

CirrusSearch: 6.5.4 (203237e) 03:44, 9 May 2021

Elastica: 6.1.3 (8af6b45) 21:04, 9 May 2021

Elasticsearch: 6.5.4


I've built the search index, and it shows that it ran without any errors:



root@test-snipped-wiki-1:/var/www/wikis/a# php /var/www/mediawiki_shared/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip --conf ./LocalSettings.php 

[snipped] Indexed 9 pages ending at 10 at 53/second

[...]

Indexed a total of 1616 pages at 103/second

root@test-snipped-wiki-1:/var/www/wikis/a# php /var/www/mediawiki_shared/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse --conf LocalSettings.php 

[snipped] Indexed 48 pages ending at 50 at 49/second

[...]
Indexed a total of 1616 pages at 72/second




However, all my indices are empty


```

# curl -XGET localhost:9200/_cat/indices

green open wiki_a_general_first     ZYIVvSdQS4C6czO9MajjqQ 4 0  0 0  1kb  1kb

green open mw_cirrus_metastore_first 33OLZIV4TfCiv4JmqcIDYA 1 0 27 6 16kb 16kb

green open wiki_a_content_first     uJkfN6K4S_ytm1cv__9QKw 4 0  0 0  1kb  1kb

green open wiki_a_archive_first     F_ZvMuQBR6SxTtvXtT232A 4 0  0 0  1kb  1kb

```


Elasticsearch is running on the same server as MediaWiki, and I have the following on my LocalSettings.php:


```

wfLoadExtension( 'Elastica' );

wfLoadExtension( 'CirrusSearch' );

$wgSearchType = 'CirrusSearch';

$wgCirrusSearchIndexBaseName =  'wiki_a';

```


Not sure what's going on, if I go to a page and add ?action=cirrusdump I get an empty JSON array `[]`, and all my search results are empty.
DCausse (WMF) (talkcontribs)

Hi,

CirrusSearch index updates rely on the MediaWiki jobqueue and it's possible that the index requests are kept there or even failed.

If the documents are kept in the jobqueue they should appear with mwscript showJobs.php --group.

If this is all empty then perhaps there were some failures and you should check the mediawiki and elasticsearch logs.

172.220.104.121 (talkcontribs)

Ah yes, seems you're correct! There was an issue in our deploy script that was preventing the job runner from booting up. Have only upgraded our test environment so didn't notice anything with lack of emails or anything else fired off by that system. Thanks!

Cirrus does not work on namespace

2
Kims79 (talkcontribs)

Hello,

On our wiki, it seems that cirrus systematically returns an error when searching for files, models, modules, etc.

For exemple : fallout-wiki.com /index.php?search=Template%3Aico&title= Sp%C3%A9cial%3ARecherche&profile=advanced&fulltext=1&ns0=1

Do you know where this problem could come from?

Sincerely,

DCausse (WMF) (talkcontribs)

Could you provide more details about the errors? Errors might be found in the mediawiki logs and in the elasticsearch logs.

If the problem is only when searching non-default namespaces it is possible that the general index was not properly created. You can check this by listing all the indices from elasticsearch: curl elastic_host:9200/_cat/indices. If you don't see an index named mywiki_general but only mywiki_content then it's probably the reason of your problem.

To solve this you must create and populate the general index running:

  • php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/UpdateOneSearchIndexConfig.php --type general
  • Rerun the ForceSearchIndex scripts like you did when setting up CirrusSearch (see the README file)
Reply to "Cirrus does not work on namespace"
Return to "CirrusSearch" page.