Extension talk:CirrusSearch

About this board

Discussion related to the CirrusSearch MediaWiki extension.

See also the open tasks for CirrusSearch on phabricator.

Donxello (talkcontribs)

Hi,

MediaWiki 1.26.3
PHP 5.6.24 (apache2handler)
MySQL 5.5.46
Elasticsearch 1.7.5

CirrusSearch 0.2

Elastica 1.3.0.0

Installation was without problems according to https://phab.wmfusercontent.org/file/data/ouaq2ogud2xcltawkhvx/PHID-FILE-pyat6n73gno5m22bmo2r/README

LocalSettings

wfLoadExtension( 'Elastica' );

require_once "$IP/extensions/CirrusSearch/CirrusSearch.php";

#$wgDisableSearchUpdate = true;

$wgCirrusSearchServers = array( 'localhost' );

$wgDebugLogGroups['CirrusSearch'] = "$IP/extensions/CirrusSearch/error.log";

$wgSearchType = 'CirrusSearch';

But when I enter something into the search field I get no results telling me there were no results matching the query.

Any ideas?

Sergezolotukhin (talkcontribs)

Hello. U need to run scripts in Cirrus maintanance:

Now run this script to generate your elasticsearch index:

php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php

Now remove $wgDisableSearchUpdate = true from LocalSettings.php.  Updates should start heading to Elasticsearch.

Next bootstrap the search index by running:

php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip

php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse

https://phabricator.wikimedia.org/diffusion/ECIR/browse/master/README

103.70.129.74 (talkcontribs)

This is not mentioned anywhere! I installed mediawiki and then the extensions elastica and cirrussearch. Kept getting the error "Fatal exception of type "RuntimeException".

After adding these lines in LocalSettings.php:

$wgCirrusSearchServers = array( 'localhost' );

$wgDebugLogGroups['CirrusSearch'] = "$IP/extensions/CirrusSearch/error.log";

$wgSearchType = 'CirrusSearch';

and running these commands suggested by Sergezolotukhin, the search started working and no where in documentation this is mentioned.

GFXDude2010 (talkcontribs)

I'm having the exact same issue. I currently have SphinxSearch installed, but it does not seem to find all the articles I believe it should find. So, I'm attempting to install CIrrus to see how it performs.

My Setup
Operating System CentOS 7
MediaWiki 1.27.0
PHP 7.0.10
Apache 2.4.23
MySQL Amazon AWS Aurora
ElasticSearch 1.7.5
Elastica REL1_27-4607acf
CirrusSearch REL1_27-dcb0cf9 (0.2)

Followed instructions here: Extension:CirrusSearch

Installed ElasticSearch via RPM; enabled and started service. Verified working via curl.

Tried enabling Elastica extension using both wfLoadExtension and require_once. Both seem to have the same bearing.

I ran all three maintenance scripts outlined in the README. It indexed nearly 10000 pages.


I've verified that elastic search is actually running:

[centos@ip-10-90-1-9 html]$ sudo systemctl status elasticsearch
● elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2016-09-01 15:12:48 UTC; 52min ago
     Docs: http://www.elastic.co
 Main PID: 8839 (java)
   CGroup: /system.slice/elasticsearch.service
           └─8839 /bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccu...


My LocalSettings.php has the following:

wfLoadExtension('Elastica');
require_once("$IP/extensions/CirrusSearch/CirrusSearch.php");
$wgCirrusSearchServers = array('localhost');
$wgSearchType = 'CirrusSearch';

When I uncomment the above settings in my LocalSettings, then attempt to search on my wiki, there are no results. So, for now, I'll have to deal with a half-working Sphinx setup, unless anyone here can suggest what I'm doing wrong?

Sergezolotukhin (talkcontribs)

root@openlist-www ~ # curl '********:9200/_cat/indices?v'

health status index                                 pri rep docs.count docs.deleted store.size pri.store.size

green  open   openlist_ua-wiki__content_first         4   0     193874        45941      2.6gb          2.6gb

green  open   openlist-wiki__content_first            4   0    2530251       723985     28.9gb         28.9gb

green  open   openlist-wiki__general_first            4   0       9792            7     35.8mb         35.8mb

green  open   mediawiki_cirrussearch_frozen_indexes   1   0          0            0       144b           144b

green  open   test                                    1   0          1            0      2.6kb          2.6kb

green  open   openlist_ru-formulars                   1   0    1648710            0    986.6mb        986.6mb

green  open   mw_cirrus_versions                      1   0          6            4     10.2kb         10.2kb

green  open   openlist_ua-wiki__general_first         4   0         63            7    727.4kb        727.4kb

green  open   openlist_ge-wiki__content_first         4   0       3514          953     68.5mb         68.5mb

green  open   openlist_ge-wiki__general_first         4   0         21            0     37.8kb         37.8kb

Sergezolotukhin (talkcontribs)

Okay. Are there documents in Elastic index? ( curl 'localhost:9200/_cat/indices?v' )

You can discover you Elastic data with DSL:

"query" : {
        "match_all" : {}
    }

The goal is to find out if data is indexed on not.

Amitumar (talkcontribs)

Hi,

I have installed cirrussearch inside mediawiki but it is not searching the word inside any uploaded documents.

Request someone to review and advise if any steps were missed:

Please note, the uploaded documents contains MS WORD, POWERPOINT, PDF'S, EXCEL, MSG (Outlook email) , TXT files.

I followed below steps:-

installed media wiki successfully.

installed elastica inside the extention folder.

installed cirrussearch inside the extention folder

after that I performed steps mentioned in README.txt file

--------------------------------------------------- Instructions in README.TXT file ----------------------------

All elastic versions prior to 5.3.1 have bugs that affect CirrusSearch:

- elastic versions before 5.3.x requires the following config in your LocalSettings.php:

  $CirrusSearchElasticQuirks = [ 'query_string_max_determinized_states' => true ];

- elastic versions before 5.3.1 suffer from a bug that prevent an index to be reindexed

  properly without missing docs when using multiple elasticsearch machines

- when using elastic prior to 5.5.2 with the extra plugin and the super_detect_noop script

  you must activate the "super_detect_noop_enable_native" option (see docs/settings.txt)

Place the CirrusSearch extension in your extensions directory.

Make sure you have the curl php library installed (sudo apt-get install php5-curl in Debian.)

You also need to install the Elastica MediaWiki extension.

Add this to LocalSettings.php:

wfLoadExtension( 'Elastica' );

require_once( "$IP/extensions/CirrusSearch/CirrusSearch.php" );

$wgDisableSearchUpdate = true;

Configure your search servers in LocalSettings.php if you aren't running Elasticsearch on localhost:

$wgCirrusSearchServers = [ 'elasticsearch0', 'elasticsearch1', 'elasticsearch2', 'elasticsearch3' ];

There are other $wgCirrusSearch variables that you might want to change from their defaults.

Now run this script to generate your elasticsearch index:

php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php

Now remove $wgDisableSearchUpdate = true from LocalSettings.php.  Updates should start heading to Elasticsearch.

Next bootstrap the search index by running:

php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip

php $MW_INSTALL_PATH/extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse

Note that this can take some time.  For large wikis read "Bootstrapping large wikis" below.

Once that is complete add this to LocalSettings.php to funnel queries to ElasticSearch:

$wgSearchType = 'CirrusSearch';

---------------------------------------------------------------------------

Reply to "No results when searching"
2001:9E8:957:7200:C2A0:1CA5:258E:5914 (talkcontribs)

It all sounds kinda straightforward until you open that README file and realize that it requires a special kind of MW nerd to get this thing up and running. Can|t this all be a little easier? I know MW hates GUI type installation and configuration but jeez.... a graphical installer with several steps to go through would be great

Spas.Z.Spasov (talkcontribs)

It looks complicated only first few times :) Here is one script that I'm using for past few years to create the index:

#!/bin/bash

# @author    Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license   https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @name      /usr/local/bin/mlw-maintenance-cirrusSearch-elasticsearch-create-index.sh
# @desc      Create elastic search index for an MediaWiki instance
#
# @source    https://phabricator.wikimedia.org/source/extension-cirrussearch/browse/master/README

IP="/var/www/wiki.example.com"

# STEP 1
sed -i 's#^$wgSearchType#// $wgSearchType#' $IP/LocalSettings.php
sed -i 's#^// $wgDisableSearchUpdate#$wgDisableSearchUpdate#' $IP/LocalSettings.php
echo -e '\n\n\n*\n* $IP/LocalSettings.php\n*\n'
grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php
echo
sleep 3
printf -- '\n\n*\n* Generate ElasticSearch Index for %s -----\n*\n\n' "$IP"
/usr/bin/php $IP/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --startOver --conf $IP/LocalSettings.php

# STEP 2
sed -i 's#^$wgDisableSearchUpdate#// $wgDisableSearchUpdate#' $IP/LocalSettings.php
grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php
echo
sleep 3
printf -- '\n\n*\n*  Bootstrap the Search Index for %s -----\n*\n\n' "$IP"
/usr/bin/php $IP/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip --conf $IP/LocalSettings.php
/usr/bin/php $IP/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse --conf $IP/LocalSettings.php

# STEP 3
sleep 3
printf -- '\n\n*\n*  Enable Cirrus Search for %s -----\n*\n\n' "$IP"
sed -i 's#^// $wgSearchType#$wgSearchType#' $IP/LocalSettings.php
echo -e '\n\n\n*\n* $IP/LocalSettings.php\n*\n'
grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php
echo

# Step 4
sleep 3
printf -- '\n\n*\n*  Update Cirrus Search Suggestions for %s -----\n*\n\n' "$IP"
/usr/bin/php $IP/extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php --conf $IP/LocalSettings.php
2001:9E8:958:7000:FC1D:AA47:52C6:681F (talkcontribs)

I bet practice helps, but it's a bummer how much manual fiddling is involved in getting an extension to work, the number of wikis not making the switch to CS because of that must be enormous

Reply to "Make installation simpler"

How to know that elasticSearch and MW communicate ?

5
Nicolas senechal (talkcontribs)

Hello,

I try to install cirrusSearch, so I have elasticSearch running as service on a windows server. I think it's running ,here is it health.json.

{

  "cluster_name" : "elasticsearch",

  "status" : "green",

  "timed_out" : false,

  "number_of_nodes" : 1,

  "number_of_data_nodes" : 1,

  "active_primary_shards" : 13,

  "active_shards" : 13,

  "relocating_shards" : 0,

  "initializing_shards" : 0,

  "unassigned_shards" : 0,

  "delayed_unassigned_shards" : 0,

  "number_of_pending_tasks" : 0,

  "number_of_in_flight_fetch" : 0,

  "task_max_waiting_in_queue_millis" : 0,

  "active_shards_percent_as_number" : 100.0

}

But when I try to search on my wiki I have a search error that it says it's a technical error. So with this url I check if it's cirrusearch with adding &cirrusDumpQuery and I get this json.

{
    "__main__": {
        "description": "full_text search for 'sql'",
        "path": "wikig4_content\/page\/_search",
        "params": {
            "timeout": "20s",
            "search_type": "dfs_query_then_fetch"
        },
        "query": {
            "_source": [
                "namespace",
                "title",
                "namespace_text",
                "wiki",
                "redirect.*",
                "timestamp",
                "text_bytes"
            ],
            "stored_fields": [
                "text.word_count"
            ],
            "query": {
                "bool": {
                    "minimum_should_match": 1,
                    "should": [
                        {
                            "query_string": {
                                "query": "sql",
                                "fields": [
                                    "all.plain^1",
                                    "all^0.5"
                                ],
                                "phrase_slop": 0,
                                "default_operator": "AND",
                                "allow_leading_wildcard": true,
                                "fuzzy_prefix_length": 2,
                                "rewrite": "top_terms_boost_1024"
                            }
                        },
                        {
                            "multi_match": {
                                "fields": [
                                    "all_near_match^2",
                                    "all_near_match.asciifolding^1.5"
                                ],
                                "query": "sql"
                            }
                        }
                    ],
                    "filter": [
                        {
                            "terms": {
                                "namespace": [
                                    0
                                ]
                            }
                        }
                    ]
                }
            },
            "highlight": {
                "pre_tags": [
                    "\ue000"
                ],
                "post_tags": [
                    "\ue001"
                ],
                "fields": {
                    "title": {
                        "type": "fvh",
                        "number_of_fragments": 0,
                        "order": "score",
                        "matched_fields": [
                            "title",
                            "title.plain"
                        ]
                    },
                    "redirect.title": {
                        "type": "fvh",
                        "number_of_fragments": 1,
                        "order": "score",
                        "fragment_size": 10000,
                        "matched_fields": [
                            "redirect.title",
                            "redirect.title.plain"
                        ]
                    },
                    "category": {
                        "type": "fvh",
                        "number_of_fragments": 1,
                        "order": "score",
                        "fragment_size": 10000,
                        "matched_fields": [
                            "category",
                            "category.plain"
                        ]
                    },
                    "heading": {
                        "type": "fvh",
                        "number_of_fragments": 1,
                        "order": "score",
                        "fragment_size": 10000,
                        "matched_fields": [
                            "heading",
                            "heading.plain"
                        ]
                    },
                    "text": {
                        "type": "fvh",
                        "number_of_fragments": 1,
                        "order": "score",
                        "fragment_size": 150,
                        "no_match_size": 150,
                        "matched_fields": [
                            "text",
                            "text.plain"
                        ]
                    },
                    "auxiliary_text": {
                        "type": "fvh",
                        "number_of_fragments": 1,
                        "order": "score",
                        "fragment_size": 150,
                        "matched_fields": [
                            "auxiliary_text",
                            "auxiliary_text.plain"
                        ]
                    },
                    "file_text": {
                        "type": "fvh",
                        "number_of_fragments": 1,
                        "order": "score",
                        "fragment_size": 150,
                        "matched_fields": [
                            "file_text",
                            "file_text.plain"
                        ]
                    }
                },
                "highlight_query": {
                    "query_string": {
                        "query": "sql",
                        "fields": [
                            "title.plain^20",
                            "redirect.title.plain^15",
                            "category.plain^8",
                            "heading.plain^5",
                            "opening_text.plain^3",
                            "text.plain^1",
                            "auxiliary_text.plain^0.5",
                            "title^10",
                            "redirect.title^7.5",
                            "category^4",
                            "heading^2.5",
                            "opening_text^1.5",
                            "text^0.5",
                            "auxiliary_text^0.25"
                        ],
                        "phrase_slop": 1,
                        "default_operator": "AND",
                        "allow_leading_wildcard": true,
                        "fuzzy_prefix_length": 2,
                        "rewrite": "top_terms_boost_1024"
                    }
                }
            },
            "suggest": {
                "text": "sql",
                "suggest": {
                    "phrase": {
                        "field": "suggest",
                        "size": 1,
                        "max_errors": 2,
                        "confidence": 2,
                        "real_word_error_likelihood": 0.95,
                        "direct_generator": [
                            {
                                "field": "suggest",
                                "suggest_mode": "always",
                                "max_term_freq": 0.5,
                                "min_doc_freq": 0,
                                "prefix_length": 2
                            }
                        ],
                        "highlight": {
                            "pre_tag": "\ue000",
                            "post_tag": "\ue001"
                        },
                        "smoothing": {
                            "stupid_backoff": {
                                "discount": 0.4
                            }
                        }
                    }
                }
            },
            "stats": [
                "suggest",
                "full_text",
                "full_text_querystring",
                "simple_bag_of_words"
            ],
            "rescore": [
                {
                    "window_size": 8192,
                    "query": {
                        "query_weight": 1,
                        "rescore_query_weight": 1,
                        "score_mode": "multiply",
                        "rescore_query": {
                            "function_score": {
                                "functions": [
                                    {
                                        "field_value_factor": {
                                            "field": "incoming_links",
                                            "modifier": "log2p",
                                            "missing": 0
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }
            ],
            "size": 21
        },
        "options": {
            "timeout": "20s",
            "search_type": "dfs_query_then_fetch"
        }
    }
}

Here is a copy of Spécial:Version

Produit Version
MediaWiki 1.37.1
PHP 8.1.2 (apache2handler)
MariaDB 10.4.22-MariaDB
ICU 70.1
Elasticsearch 6.8.23

Strange that the wiki shows the version of elasticSearch...

So how to know that elasticSearch and MW communicate well?

Any other idea is apricied

Thank you,

Spas.Z.Spasov (talkcontribs)

Hello, when the search engine is set to CirrusSearch, you will get red box with warning message within the search results page if there is a trouble with Elasticsearch.

Nicolas senechal (talkcontribs)

Thank you for your quick response , it's what I get, so what I have to do with ElasticSearch how I can check if it works properly ? because in my logs I don't have error, I check with another wiki(who work) that I use and the elasticSearch log's are the same exeptc for this line : [2022-05-02T14:23:04,386][INFO ][o.e.c.r.a.AllocationService] [Y4F2XBY] Cluster health status changed from [RED] to [GREEN] (reason: [shards started [[test_content_first][2], [test_content_first][0], [mw_cirrus_metastore_first][0]] ...]).

Nicolas senechal (talkcontribs)

So, I go to http://localhost:9200/_cat/indices?format=json&pretty and my server is OK, but I have 4 parts on my json and in my wikitest I have the same (and it works) so I don't know what I can do or where I can watch to know the issue of this...

here is the result of http://localhost:9200/_cat/indices?format=json&pretty

[
  {
    "health" : "green",
    "status" : "open",
    "index" : "test_archive_first",
    "uuid" : "jQZYnyGUStWWqDVjfLxpHg",
    "pri" : "4",
    "rep" : "0",
    "docs.count" : "0",
    "docs.deleted" : "0",
    "store.size" : "1kb",
    "pri.store.size" : "1kb"
  },
  {
    "health" : "green",
    "status" : "open",
    "index" : "test_content_first",
    "uuid" : "x9Y9ACxWSg-oBLxvKbzpjw",
    "pri" : "4",
    "rep" : "0",
    "docs.count" : "5",
    "docs.deleted" : "1",
    "store.size" : "44.2kb",
    "pri.store.size" : "44.2kb"
  },
  {
    "health" : "green",
    "status" : "open",
    "index" : "mw_cirrus_metastore_first",
    "uuid" : "rIRWtNZ_T6GxuLrKH6lstw",
    "pri" : "1",
    "rep" : "0",
    "docs.count" : "25",
    "docs.deleted" : "6",
    "store.size" : "15.4kb",
    "pri.store.size" : "15.4kb"
  },
  {
    "health" : "green",
    "status" : "open",
    "index" : "test_general_first",
    "uuid" : "39kWAi7cSnyME6R0BWlhyQ",
    "pri" : "4",
    "rep" : "0",
    "docs.count" : "21",
    "docs.deleted" : "4",
    "store.size" : "192kb",
    "pri.store.size" : "192kb"
  }
]
Nicolas senechal (talkcontribs)

I test with my production setting of media wiki, on my test wiki everything it's OK, so... if it's not the server, not the wiki, not the communication between server and wiki. The only thing that I see it's a server response problem or server don't index the pages with the database... so how I can test that , how I can view the connection between database and elasticSearch because after the look on Google, I don't find some test with MW?


So I follow UPGRADE and now I don't have any error (yeah) but I have no result so, I think I should index but, the first part of upgrade alrady do that?

I have a warrning with the segond part, I don't know if it's important or not, so I passed out.

# php metastore.php --upgrade
PHP Warning:  Undefined array key "REMOTE_ADDR" in D:\WikiG4\xampp\htdocs\WikiG4\LocalSettings.php on line 138
Warning: Undefined array key "REMOTE_ADDR" in D:\WikiG4\xampp\htdocs\WikiG4\LocalSettings.php on line 138
mw_cirrus_metastore is up and running with version 2.0

here is the part of the warning in my localsettings.

$wgGroupPermissions['*']['edit'] = false;
$wgGroupPermissions['interface-admin']['gadgets-edit'] = true;//config gadget
$wgGroupPermissions['interface-admin']['gadgets-definition-edit'] = true;//config gadget
if ( $_SERVER['REMOTE_ADDR'] == $serverAdress ) {
  $wgGroupPermissions['*']['read'] = true;
  $wgGroupPermissions['*']['edit'] = true;
  $wgGroupPermissions['*']['writeapi'] = true;
}

Sorry I forgot why I put that but, I think it's an error issu, because my wiki it's private, so with some extention it could be have a bug so here is the solution...

Reply to "How to know that elasticSearch and MW communicate ?"

MW 1.39+ Elasticsearch version?

3
Spiros71 (talkcontribs)

I can see in the page "MediaWiki 1.33.x - 1.38.x require Elasticsearch 6.5.x - 6.8.x (6.8.23+ recommended)". Are there any plans to bump Elasticsearch version in MW 1.39 or future version (current Elasticsearch version is 8.13)? Also, is it likely that with MW 1.39 there will be php 8 support (for a setup with Cirrus search).

EBernhardson (WMF) (talkcontribs)

> Are there any plans to bump Elasticsearch version in MW 1.39 or future version

Doubtful that 1.39 will be updated. 1.40 will likely support 7.10.2. Due to licensing changes we will not be continuing with elasticsearch beyond 7.10.2. High probability the opensearch project will be replacing elasticsearch, but not currently decided.

> Also, is it likely that with MW 1.39 there will be php 8 support

iirc php 8 support is limited by the library used to talk to elasticsearch 6. Similar to the above, it's most likely going to be in 1.40.

Spiros71 (talkcontribs)

Thanks for the replies! Apparently, opensearch being a fork, there is not much trouble in the transition. I have read some good things about Vespa too: https://vespa.ai/vespa-elastic-solr

Reply to "MW 1.39+ Elasticsearch version?"
JacekGdanski (talkcontribs)

I'm trying to install "Elastic Search" on a new company wiki.

The description of the installation is confusing and ambiguous.

Guestions:

1. Why are two extensions installed instead of one: ""Elastica" and "CirrusSearch"?

2. In the description of the installation it is recommended to pay attention to the version of "Elastic Search". Was this not fixed when downloading the extension version?

3. There is a recommendation to install "Elastic Search" as a service in the Docker image container, but also as an extension. Are both steps required or only one of them?

Ciencia Al Poder (talkcontribs)
  1. I don't know, Elastica looks like a library to facilitate integration with ElasticSearch
  2. Elastic Search is an external standalone software that you must install. It's a database system that provides search and indexing functionality, and where the current text of all pages of your wiki will be indexed for faster search results. The communication between MediaWiki and ElasticSearch is done through web services. Every version of ElasticSearch change how those web services work, and cause compatibility problems. You must install a version of Elastic Search compatible with the MediaWiki version you're currently used, and not the other way round.
  3. See previous point. Cirrus Search enhances the search functionality by using Elastic Search, while the native MediaWiki search uses a table in the same database as the wiki with very simple search functionality.
JacekGdanski (talkcontribs)

Thank you for your answer. That clears up many things.

1. Could someone add exactly these words [2] to the description of the extension? Can I do this?

2. Where is it described how external "Elastic Search" service is fed with Wiki data? In this description it is simply "magic" - there is no word about the basic mechanism.

Ciencia Al Poder (talkcontribs)

What I wrote was a TL;DR, but everything is explained if you follow the links on the page: ElasticSearch link points to a page describing what ElasticSearch is, and since it's a dependency, when you would go to the page for installation you'll see it's a new program. If you feel that this TL;DR is needed, feel free to add it to the page.

About how it's fed, this is part of the setup instructions (the Now follow the setup instructions in the CirrusSearch README delivered)

This post was hidden by Ciencia Al Poder (history)
Reply to "Dependency questions"

elastic search using log4j 2.11.1.jar

8
Pooja2425 (talkcontribs)

Hi Team,

we are using below,

MediaWiki 1.35.3
PHP 7.4.23 (apache2handler)
MySQL 8.0.26
Lua 5.1.5
Elasticsearch 6.5.4

/usr/share/elasticsearch/lib/log4j-1.2-api-2.11.1.jar

log4j-api-2.11.1.jar

log4j-core-2.11.1.jar

x-pack-security/log4j-slf4j-impl-2.11.1.jar


please provide us any patch which is higher then log4j>2.15.0

Ciencia Al Poder (talkcontribs)

"We" don't provide ElasticSearch. ElasticSearch was installed by yourself from an external source and you should ask them

DHillBCA (talkcontribs)
Pooja2425 (talkcontribs)

Thanks alot @DHillBCA for help,

I checked this, it seems i need to add -Dlog4j2.formatMsgNoLookups=true into etc/elasticsearch/jvm.options

because we are using elastic search 6.5.4 version.


pls let me know where i can ask questions for this.

Pooja2425 (talkcontribs)
DHillBCA (talkcontribs)

Removing JndiLookup is not recommended, per the article.

If my read of the article is correct, the step you took is a good patch in the absence of updating log4j to 2.16 (2.15 was found to have related issues, so a new version was released). 2.16 does this by default.

Updating to log4j 2.16 and ensuring you're using an up-to-date version of the Java SDK appears to be the best defense against this problem.

Realsalt (talkcontribs)

6.8.21 should be good. From linked article: "[this version sets]Dlog4j2.formatMsgNoLookups=true in the JVM options and remove the JndiLookup class for you "

From elastic itself: "As of December 13, 2021, we have released Elasticsearch 6.8.21 and 7.16.1 which set the JVM option identified below and remove the vulnerable JndiLookup class from Log4j out of an abundance of caution"

Realsalt (talkcontribs)

I guess that question I have is should this be flagged as a critical version on this main page. Something like {{warning}} template? Current text says "MediaWiki 1.33.x - 1.38.x require Elasticsearch 6.5.x - 6.8.x (6.8.21+ recommended)". Is that sufficient?

Reply to "elastic search using log4j 2.11.1.jar"

Search inside uploaded documents

13
Xaris~mediawikiwiki (talkcontribs)

Question: Can this extention search inside documents which have been uploaded to the wiki like PDF's?

Ricordisamoa (talkcontribs)
NEverett (WMF) (talkcontribs)

I've just backported most of the features in Cirrus' master branch to the REL1_22 branch, including this. If you want to try it make sure to get the new version of the Elastica plugin on its REL1_22 branch as well and rebuild your index.

2.82.64.19 (talkcontribs)

Do we need to force index the pdf files? I'm seeing no results from pdfs.

Nemo bis (talkcontribs)

First of all try a null edit and wait some time (at most few hours) for the job queue; report back if that wasn't enough.

Chris d edge (talkcontribs)

I've been working on a method that parses document files (PDFs, Word, PPT, etc.) using Tika to extract the document text, and then re-insert the extracted text into the file_text field of the WIKI_general_first index inside Elasticsearch. On this point, I have a couple of questions: 1) Does this sound like the proper method to provide searchable text from documents in CirrusSearch? 2) Has anyone else done anything similar?

On point 2, the reason I ask is that for some documents I'm extracting text from, the resulting text can be huge (100s of MBs) and can grind the search to a hault for some queries (mostly for terms which there aren't many of inside the index).

Any pointers would be greatly appreciated.

SmartK (talkcontribs)
173.164.76.121 (talkcontribs)

Hello Everyone,

I have just added the extension CirrusSearch with all the dependencies. I am not able to search through PDF, txt and, docx. Please help!

Regards,

Dgennaro (talkcontribs)

I am also not able to index documents. I have Image Authorization configured.

Andreas Plank (talkcontribs)

I got PDF search working but not for *.doc files (PDF search works on MW 1.26.2 and MW 1.28.2). You need at least to

  1. Does anybody have a solution for searching inside *.doc files yet?
  2. Did I miss some configuration to set up?
  3. Would it need an FileHandler for doc files to get it working?
SmartK (talkcontribs)
S0ring (talkcontribs)
CtapMaddog (talkcontribs)

A new option: Extension:TikaAllTheFiles. This extension uses Tika to do content extraction (text and/or metadata), and provides the content to CirrusSearch for indexing.

Reply to "Search inside uploaded documents"
Hamburg0815 (talkcontribs)

I installed CirrusSearch and I enabled the completion suggester. I get suggestions when I enter text into the search field.

What I haven't been able to figure out is how to enable "Did you mean" suggestions like WP has them. Also, I'd like to have the "Showing results for X. No results found for Y." feature. How do I do this?

DCausse (WMF) (talkcontribs)

Hi,

this feature should be enabled by default but will use the title of the indexed pages as "language model". It might be that on your wiki the titles themselves do not bear enough information to generate a useful "language model". I'd suggest increasing the information given to this feature by providing the opening_text or if you can afford it the whole text. The configuration should look like this:

$wgCirrusSearchPhraseSuggestUseOpeningText = true;
// or
// $wgCirrusSearchPhraseSuggestUseText = true;
// to use the whole text (will obviously require more space and memory on your elasticsearch cluster)

You will need to reindex your wiki using UpdateSearchIndexConfig after changing these configuration variables.

The Showing results for X. No results found for Y should be working once you have CirrusSearch able to detect typos properly.

Hamburg0815 (talkcontribs)

Hi DCausse,

thank you for your reply. I think it was "kind of" working all along because I realized that for some titles, I get a suggestion when I misspell it but for most titles I don't. For example, when I search for "parasymapthikus", it suggests the correctly spelled title "parasympathikus". When I search for "vagotnoie", it just says "no results" but it does not suggest "vagotonie" which is an existing title. Note that I switched two consecutive letters each time, but one of them was recognized as a misspelling while the other wasn't.

Is there a setting to make the search algorithm more fuzzy?

I get the same results with or without the wgCirrusSearchPhraseSuggestUseOpeningText parameter set to true. I did run UpdateSearchIndexConfig.php after I changed the param and in addition to that, I also ran ForceSearchIndex.php, runJobs.php, and UpdateSuggesterIndex.php.

DCausse (WMF) (talkcontribs)

Hi,

there are many parameters to tune this algorithm (see profiles/PhraseSuggesterProfiles.config.php).

You could create your own profile tuning based on the default one by adding:

// copied from profiles/PhraseSuggesterProfiles.config.php (doc removed here)
$wgCirrusSearchPhraseSuggestProfiles = [
    "my_profile" => [
                'total_hits_threshold' => 15000,
                'mode' => 'always',
                'confidence' => 2.0,
                'max_errors' => 2,
                'real_word_error_likelihood' => 0.95,
                'max_term_freq' => 0.5,
                'min_doc_freq' => 0.0,
                'prefix_length' => 2,
                'collate' => false,
                'collate_minimum_should_match' => '3<66%',
                'smoothing_model' => [
                        'stupid_backoff' => [
                                'discount' => 0.4
                        ]
                ],
    ]
];

// add a new fallback profile using this new settings
$wgCirrusSearchFallbackProfiles = [
    "my_phrasesuggest_profile" => [
            'methods' => [
                    'phrase-default' => [
                            'class' => \CirrusSearch\Fallbacks\PhraseSuggestFallbackMethod::class,
                            'params' => [
                                    'profile' => 'my_profile',
                            ]
                    ],
            ]
    ]
];

// Tell cirrus to use this fallback profile
$wgCirrusSearchFallbackProfile = "my_phrasesuggest_profile";

But again the algorithm highly depends on the data it has been fed with and feeding it with more text ($wgCirrusSearchPhraseSuggestUseText = true) is probably to more helpful than fine-tuning all this.

Hamburg0815 (talkcontribs)

Hi,

I tried the fallback profile as you suggested but unfortunately, it didn't make a difference. I also tried setting $wgCirrusSearchPhraseSuggestUseText = true but it didn't change anything, either.

$wgCirrusSearchPhraseSuggestProfiles = 'alternative' didn't improve things, either. I set $wgDebugLogFile and looked at the log, but there weren't any errors or warnings that caught my eye.

I noticed when I misspell a word in the search box, it always suggests the correct spelling in the as-you-type suggestions. But when I submit a search, it only works for some words but not for others.

DCausse (WMF) (talkcontribs)

Hi,

The profile I provided is just a copy of the default profile and is meant to be tuned but I have doubts you could greatly increase recall this way.

I agree that it is frustrating to see the typo being properly corrected during as-you-type but not after hitting the search button. This was brought up multiple times on WMF wikis as well task T135920 (albeit for slightly different reasons).

Unfortunately I don't see other ways to improve the behavior other than:

  • adding more text
  • implement a completely new DYM component that works better on wikis without enough content
Hamburg0815 (talkcontribs)

Hi DCausse,

thanks, that makes sense. So as our wiki grows, this feature should work better.

Reply to ""Did you mean" not working"

CirrusSearch not showing autocomplete on Main Search Bar

3
2601:98A:4102:2530:CDCD:45BD:50B2:2B99 (talkcontribs)

Hello,


CirrusSearch will autocomplete when I am creating a page link and typing in the page name, but it will not autocomplete on the main search page. Here's the LocalSettings.php I have set up.

#CirrusSearch
wfLoadExtension( 'Elastica' );
wfLoadExtension( 'CirrusSearch' );
$wgSearchType = 'CirrusSearch';
$wgCirrusSearchUseCompletionSuggester = 'yes';
$wgJobRunRate = 2;
$wgCirrusSearchCompletionSettings = 'fuzzy-subphrases';
$wgCirrusSearchCompletionSuggesterSubphrases = [
  'build' => true,
  'use' => true,
  'type' => 'anywords',
  'limit' => 10,
];

The indexing appears to be complete, and the search results when I manually press enter seem to be good. But it simply doesn't autocomplete when doing a proper search, despite working in the visualeditor when linking a page.

2601:98A:4102:2530:CDCD:45BD:50B2:2B99 (talkcontribs)

To make things more confusing, if I search, and then run a search in the secondary search bar that pops up above the results, autocomplete works there too. It's only on the wiki's main search bar at the top of the page.

Ciencia Al Poder (talkcontribs)

Hit F12 to open the browser's console and look for any JavaScript error that may break other scripts of the page. Also, type some characters and see if it performs a request to the api to fetch search suggestions, and see if it returns anything in the response.

Reply to "CirrusSearch not showing autocomplete on Main Search Bar"
176.135.141.72 (talkcontribs)

Hello,

After switching to mediawikia 1.37wmf22 it seems that elastic is now working correctly.

Nevertheless, an error persists when searching with a ":" character.


[ec3539461ed077b3d9b1303c] /index.php?search=%C3%A9v%C3%A8nement+%3A+&title=Sp%C3%A9cial%3ARecherche&go=Lire&ns0=1&ns1=1&ns2=1&ns3=1&ns4=1&ns5=1&ns6=1&ns7=1&ns8=1&ns9=1&ns10=1&ns11=1&ns12=1&ns13=1&ns14=1&ns15=1&ns200=1&ns201=1&ns202=1&ns203=1&ns274=1&ns275=1&ns710=1&ns711=1&ns828=1&ns829=1&ns2300=1&ns2301=1&ns2302=1&ns2303=1&ns3000=1&ns3001=1 ParseError: syntax error, unexpected token "match"

Backtrace:

from /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/MatchQuery.php(10)

#0 /var/www/vhosts/fallout-wiki.com/httpdocs/vendor/composer/ClassLoader.php(322): Composer\Autoload\includeFile()

#1 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/MetaStore/MetaNamespaceStore.php(94): Composer\Autoload\ClassLoader->loadClass()

#2 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/MetaStore/MetaNamespaceStore.php(78): CirrusSearch\MetaStore\MetaNamespaceStore->queryFilter()

#3 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/Searcher.php(468): CirrusSearch\MetaStore\MetaNamespaceStore->find()

#4 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/Util.php(124): CirrusSearch\Searcher->CirrusSearch\{closure}()

#5 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/poolcounter/PoolCounterWorkViaCallback.php(74): CirrusSearch\Util::CirrusSearch\{closure}()

#6 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/poolcounter/PoolCounterWork.php(162): PoolCounterWorkViaCallback->doWork()

#7 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/Util.php(182): PoolCounterWork->execute()

#8 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/Searcher.php(474): CirrusSearch\Util::doPoolCounterWork()

#9 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/Searcher.php(700): CirrusSearch\Searcher->findNamespace()

#10 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/Hooks.php(530): CirrusSearch\Searcher->updateNamespacesFromQuery()

#11 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/HookContainer/HookContainer.php(338): CirrusSearch\Hooks::onSearchGetNearMatch()

#12 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/HookContainer/HookContainer.php(137): MediaWiki\HookContainer\HookContainer->callLegacyHook()

#13 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/HookContainer/HookRunner.php(3192): MediaWiki\HookContainer\HookContainer->run()

#14 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/search/SearchNearMatcher.php(168): MediaWiki\HookContainer\HookRunner->onSearchGetNearMatch()

#15 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/search/SearchNearMatcher.php(70): SearchNearMatcher->getNearMatchInternal()

#16 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/specials/SpecialSearch.php(341): SearchNearMatcher->getNearMatch()

#17 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/specials/SpecialSearch.php(200): SpecialSearch->goResult()

#18 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/specialpage/SpecialPage.php(646): SpecialSearch->execute()

#19 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/specialpage/SpecialPageFactory.php(1366): SpecialPage->run()

#20 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/MediaWiki.php(314): MediaWiki\SpecialPage\SpecialPageFactory->executePath()

#21 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/MediaWiki.php(925): MediaWiki->performRequest()

#22 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/MediaWiki.php(559): MediaWiki->main()

#23 /var/www/vhosts/fallout-wiki.com/httpdocs/index.php(53): MediaWiki->run()

#24 /var/www/vhosts/fallout-wiki.com/httpdocs/index.php(46): wfIndexMain()

#25 {main}


Cordialy

Ciencia Al Poder (talkcontribs)

This seems to be a compatibility issue with PHP 8. Apparently it's resolved already, looking at phab:T268861. Did you upgrade Elastica as well?

176.172.11.143 (talkcontribs)

It doesn't work, I'll try a 1.37 upgrade once it's officially released and I'll let you know :)

Ciencia Al Poder (talkcontribs)

If you download current master of CirrusSearch it should be fixed

176.134.81.16 (talkcontribs)

Hello,


I confirm that the bug is still present (under the alpha mediawiki version):


[99a80ba4319b973cca1c54a7] /index.php?search=%C3%A9v%C3%A9nement+%3A+c&title=Sp%C3%A9cial%3ARecherche&go=Lire&ns0=1 ParseError: syntax error, unexpected token "match"

Backtrace:

from /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/Elastica/vendor/ruflin/elastica/lib/Elastica/Query/MatchQuery.php(10)

#0 /var/www/vhosts/fallout-wiki.com/httpdocs/vendor/composer/ClassLoader.php(322): Composer\Autoload\includeFile()

#1 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/MetaStore/MetaNamespaceStore.php(95): Composer\Autoload\ClassLoader->loadClass()

#2 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/MetaStore/MetaNamespaceStore.php(79): CirrusSearch\MetaStore\MetaNamespaceStore->queryFilter()

#3 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/Searcher.php(469): CirrusSearch\MetaStore\MetaNamespaceStore->find()

#4 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/Util.php(125): CirrusSearch\Searcher->CirrusSearch\{closure}()

#5 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/poolcounter/PoolCounterWorkViaCallback.php(74): CirrusSearch\Util::CirrusSearch\{closure}()

#6 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/poolcounter/PoolCounterWork.php(162): PoolCounterWorkViaCallback->doWork()

#7 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/Util.php(183): PoolCounterWork->execute()

#8 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/Searcher.php(475): CirrusSearch\Util::doPoolCounterWork()

#9 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/Searcher.php(701): CirrusSearch\Searcher->findNamespace()

#10 /var/www/vhosts/fallout-wiki.com/httpdocs/extensions/CirrusSearch/includes/Hooks.php(531): CirrusSearch\Searcher->updateNamespacesFromQuery()

#11 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/HookContainer/HookContainer.php(338): CirrusSearch\Hooks::onSearchGetNearMatch()

#12 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/HookContainer/HookContainer.php(137): MediaWiki\HookContainer\HookContainer->callLegacyHook()

#13 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/HookContainer/HookRunner.php(3192): MediaWiki\HookContainer\HookContainer->run()

#14 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/search/SearchNearMatcher.php(168): MediaWiki\HookContainer\HookRunner->onSearchGetNearMatch()

#15 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/search/SearchNearMatcher.php(70): SearchNearMatcher->getNearMatchInternal()

#16 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/specials/SpecialSearch.php(341): SearchNearMatcher->getNearMatch()

#17 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/specials/SpecialSearch.php(200): SpecialSearch->goResult()

#18 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/specialpage/SpecialPage.php(647): SpecialSearch->execute()

#19 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/specialpage/SpecialPageFactory.php(1366): SpecialPage->run()

#20 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/MediaWiki.php(314): MediaWiki\SpecialPage\SpecialPageFactory->executePath()

#21 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/MediaWiki.php(925): MediaWiki->performRequest()

#22 /var/www/vhosts/fallout-wiki.com/httpdocs/includes/MediaWiki.php(559): MediaWiki->main()

#23 /var/www/vhosts/fallout-wiki.com/httpdocs/index.php(53): MediaWiki->run()

#24 /var/www/vhosts/fallout-wiki.com/httpdocs/index.php(46): wfIndexMain()

#25 {main}

Ciencia Al Poder (talkcontribs)
Return to "CirrusSearch" page.