Rozšíření:CirrusSearch

This page is a translated version of the page Extension:CirrusSearch and the translation is 4% complete.
Základní informace k tomuto rozšíření MediaWiki
CirrusSearch
Stav rozšíření: stabilní
Zavádění Hledání, API , Háček
Popis Implementuje vyhledávání MediaWiki pomocí Elasticsearch
Napsal(i) Nik Everett, Chad Horohoe, Erik Bernhardson
Nejnovější verze průběžné aktualizace
Zásady kompatibility Vydání snímků následuje MediaWiki. Hlavní vývojová větev není zpětně kompatibilní.
MediaWiki >= 1.42.0
Composer mediawiki/cirrussearch
Licence GNU General Public License 2.0 nebo pozdější
Zdrojový kód
README
  • $wgCirrusSearchDeduplicateInQuery
  • $wgCirrusSearchLanguageWeight
  • $wgCirrusSearchAutomationCIDRs
  • $wgCirrusSearchUseIcuFolding
  • $wgCirrusSearchStemmedWeight
  • $wgCirrusSearchQueryStringMaxDeterminizedStates
  • $wgCirrusSearchCrossClusterSearch
  • $wgCirrusSearchExtraIndexSettings
  • $wgCirrusSearchAutomationUserAgentRegex
  • $wgCirrusSearchTalkNamespaceWeight
  • $wgCirrusSearchPrefixWeights
  • $wgCirrusSearchPrefixSearchRescoreProfile
  • $wgCirrusSearchDisableUpdate
  • $wgCirrusSearchActiveTest
  • $wgCirrusSearchExtraFieldsInSearchResults
  • $wgCirrusSearchMoreLikeThisMaxQueryTermsLimit
  • $wgCirrusSearchUseIcuTokenizer
  • $wgCirrusSearchCompletionBannedPageIds
  • $wgCirrusSearchOptimizeIndexForExperimentalHighlighter
  • $wgCirrusSearchRescoreProfiles
  • $wgCirrusSearchPhraseRescoreBoost
  • $wgCirrusSearchInterwikiProv
  • $wgCirrusSearchDefaultCluster
  • $wgCirrusSearchQueryStringMaxWildcards
  • $wgCirrusSearchElasticQuirks
  • $wgCirrusSearchMaxFileTextLength
  • $wgCirrusSearchFallbackProfiles
  • $wgCirrusSearchMoreLikeThisTTL
  • $wgCirrusSearchAllowLeadingWildcard
  • $wgCirrusSearchInterwikiPrefixOverrides
  • $wgCirrusSearchMaintenanceTimeout
  • $wgCirrusSearchReplicas
  • $wgCirrusSearchPhraseSlop
  • $wgCirrusSearchBoostOpening
  • $wgCirrusSearchWriteBackoffExponent
  • $wgCirrusSearchUserTesting
  • $wgCirrusSearchShardCount
  • $wgCirrusSearchUseCompletionSuggester
  • $wgCirrusSearchPhraseSuggestReverseField
  • $wgCirrusSearchFallbackProfile
  • $wgCirrusSearchFragmentSize
  • $wgCirrusSearchUnlinkedArticlesToUpdate
  • $wgCirrusSearchCustomPageFields
  • $wgCirrusSearchClientSideUpdateTimeout
  • $wgCirrusSearchIgnoreOnWikiBoostTemplates
  • $wgCirrusSearchRegexMaxDeterminizedStates
  • $wgCirrusSearchInterwikiHTTPConnectTimeout
  • $wgCirrusSearchExtraIndexes
  • $wgCirrusSearchCategoryDepth
  • $wgCirrusSearchMergeSettings
  • $wgCirrusSearchClusters
  • $wgCirrusSearchCrossProjectShowMultimedia
  • $wgCirrusSearchBannedPlugins
  • $wgCirrusSearchMoreLikeThisConfig
  • $wgCirrusSearchClusterOverrides
  • $wgCirrusSearchCrossProjectBlockScorerProfiles
  • $wgCirrusSearchEnableIncomingLinkCounting
  • $wgCirrusSearchNearMatchWeight
  • $wgCirrusSearchWriteIsolateClusters
  • $wgCirrusSearchIndexedRedirects
  • $wgCirrusSearchIndexAllocation
  • $wgCirrusSearchNumCrossProjectSearchResults
  • $wgCirrusSearchLanguageDetectors
  • $wgCirrusSearchUpdateShardTimeout
  • $wgCirrusSearchEnableCrossProjectSearch
  • $wgCirrusSearchFullTextQueryBuilderProfiles
  • $wgCirrusSearchCompletionDefaultScore
  • $wgCirrusSearchWriteClusters
  • $wgCirrusSearchCompletionSuggesterHardLimit
  • $wgCirrusSearchRecycleCompletionSuggesterIndex
  • $wgCirrusSearchLogElasticRequests
  • $wgCirrusSearchConnectionAttempts
  • $wgCirrusSearchElasticaWritePartitionCounts
  • $wgCirrusSearchWikiToNameMap
  • $wgCirrusSearchMaxFullTextQueryLength
  • $wgCirrusSearchLogElasticRequestsSecret
  • $wgCirrusSearchEnableRegex
  • $wgCirrusSearchClientSideSearchTimeout
  • $wgCirrusSearchDeduplicateInMemory
  • $wgCirrusSearchUseEventBusBridge
  • $wgCirrusSearchDeduplicateAnalysis
  • $wgCirrusSearchExtraBackendLatency
  • $wgCirrusSearchNamespaceMappings
  • $wgCirrusSearchWMFExtraFeatures
  • $wgCirrusSearchPreferRecentUnspecifiedDecayPortion
  • $wgCirrusSearchNamespaceResolutionMethod
  • $wgCirrusSearchDocumentSizeLimiterProfiles
  • $wgCirrusSearchSearchShardTimeout
  • $wgCirrusSearchCategoryMax
  • $wgCirrusSearchPrivateClusters
  • $wgCirrusSearchSimilarityProfiles
  • $wgCirrusSearchCategoryEndpoint
  • $wgCirrusSearchRescoreFunctionChains
  • $wgCirrusSearchPoolCounterKey
  • $wgCirrusSearchCompletionProfiles
  • $wgCirrusSearchMaxShardsPerNode
  • $wgCirrusSearchWeights
  • $wgCirrusSearchEnableArchive
  • $wgCirrusSearchSimilarityProfile
  • $wgCirrusSearchInterwikiThreshold
  • $wgCirrusSearchIndexDeletes
  • $wgCirrusSearchDocumentSizeLimiterProfile
  • $wgCirrusSearchFiletypeAliases
  • $wgCirrusSearchDevelOptions
  • $wgCirrusSearchPrefixSearchStartsWithAnyWord
  • $wgCirrusSearchUpdateConflictRetryCount
  • $wgCirrusSearchInterwikiHTTPTimeout
  • $wgCirrusSearchFetchConfigFromApi
  • $wgCirrusSearchBoostTemplates
  • $wgCirrusSearchExtraIndexBoostTemplates
  • $wgCirrusSearchPrefixIds
  • $wgCirrusSearchFullTextQueryBuilderProfile
  • $wgCirrusSearchStripQuestionMarks
  • $wgCirrusSearchMoreLikeThisFields
  • $wgCirrusSearchIndexBaseName
  • $wgCirrusSearchMasterTimeout
  • $wgCirrusSearchSanityCheck
  • $wgCirrusSearchTextcatConfig
  • $wgCirrusSearchNamespaceWeights
  • $wgCirrusSearchCrossProjectOrder
  • $wgCirrusSearchTextcatModel
  • $wgCirrusSearchRescoreProfile
  • $wgCirrusSearchMoreAccurateScoringMode
  • $wgCirrusSearchMaxPhraseTokens
  • $wgCirrusSearchCrossProjectSearchBlockList
  • $wgCirrusSearchCrossProjectProfiles
  • $wgCirrusSearchLanguageToWikiMap
  • $wgCirrusSearchMaxIncategoryOptions
  • $wgCirrusSearchEnableAltLanguage
  • $wgCirrusSearchWikimediaExtraPlugin
  • $wgCirrusSearchCompletionSuggesterUseDefaultSort
  • $wgCirrusSearchLinkedArticlesToUpdate
  • $wgCirrusSearchCompletionSuggesterSubphrases
  • $wgCirrusSearchPreferRecentDefaultHalfLife
  • $wgCirrusSearchSlowSearch
  • $wgCirrusSearchFunctionRescoreWindowSize
  • $wgCirrusSearchEnablePhraseSuggest
  • $wgCirrusSearchCompletionSettings
  • $wgCirrusSearchUseExperimentalHighlighter
  • $wgCirrusSearchFeedbackLink
  • $wgCirrusSearchUpdateDelay
  • $wgCirrusSearchRefreshInterval
  • $wgCirrusSearchInterleaveConfig
  • $wgCirrusSearchPhraseRescoreWindowSize
  • $wgCirrusSearchMoreLikeThisAllowedFields
  • $wgCirrusSearchPreferRecentDefaultDecayPortion
  • $wgCirrusSearchClientSideConnectTimeout
  • $wgCirrusSearchPhraseSuggestUseText
  • $wgCirrusSearchPhraseSuggestProfiles
  • $wgCirrusSearchDefaultNamespaceWeight
  • $wgCirrusSearchInterwikiSources
  • $wgCirrusSearchPhraseSuggestUseOpeningText
  • $wgCirrusSearchICUNormalizationUnicodeSetFilter
  • $wgCirrusSearchICUFoldingUnicodeSetFilter
  • $wgCirrusSearchReplicaGroup
Čtvrtletní stahování 373 (Ranked 12th)
Používání veřejných wikin 1,226 (Ranked 212nd)
Přeložte rozšíření CirrusSearch, používá-li lokalizaci z translatewiki.net
Vagrant role cirrussearch
Problémy Otevřené úkoly · Nahlásit chybu

The CirrusSearch extension implements searching for MediaWiki using Elasticsearch.

Elasticsearch is a standalone third-party software you must install as a requirement for this extension. It is a database system that provides search and indexing functionality, where the current text of your wiki pages gets indexed for faster and improved search results. The communication between MediaWiki and ElasticSearch is done through web services.

See also the help page on using this extension.

  • No native dependencies that would make this difficult to install
    • The only dependencies are pure-PHP MediaWiki extensions and Elasticsearch itself
  • Provide a near-real-time search index for wiki pages that's extendable by other MediaWiki extensions
  • Provide all of the query options MWSearch has given users, and more

Dependencies

PHP and cURL
  • Note that Elasticsearch versions prior to 6.8 are not compatible with PHP 8.
Elasticsearch

Every version of ElasticSearch change how web services work, and cause compatibility problems. You must install the version of Elastic Search compatible with the version of MediaWiki you are currently using:

  • MediaWiki 1.29.x - 1.30.x require Elasticsearch 5.3.x - 5.4.x.
  • MediaWiki 1.31.x - 1.32.x require Elasticsearch 5.5.x - 5.6.x.
  • MediaWiki 1.33.x - 1.38.x require Elasticsearch 6.5.x - 6.8.x. (6.8.23+ recommended)
  • MediaWiki 1.39+ require Elasticsearch 7.10.2 (6.8.23+ is possible using a compatibility layer )

Take note that a Java installation like OpenJDK is needed in addition. It's best to use the official Elasticsearch Docker image or a self-hosted version. A managed product like Amazon OpenSearch (formerly Amazon Elasticsearch) can work but may require additional configuration depending on its specifics. For example, Amazon OpenSearch only listens for Elasticsearch API requests over HTTPS on port 443 (i.e. it does not expose the default Elasticsearch port 9200), so a TLS-enabled proxy (e.g. Nginx) can enable CirrusSearch to communicate with an Amazon OpenSearch cluster.

Elastica
  • Elastica is a PHP library to talk to Elasticsearch. Install Elastica per the instructions below.

Other
  • Due to the actual handling of jobs by the CirrusSearch extension, it is advisable to set up jobs in Redis to prevent messages like Notice: unserialize(): Error at offset 64870 of 65535 bytes in JobQueueDB.php and subsequent errors like Unsupported operand types.

See úkol T157759.

Installation

Elastica

Even though the instructions below tell you to only run Composer when installing from git, it may be necessary to issue it anyway in order to install all PHP dependencies.

  • Stáhněte soubor/y a vložte je do adresáře pojmenovaného Elastica ve vaší složce extensions/.
    Vývojáři a přispěvatelé kódu by si místo toho měli nainstalovat rozšíření from Git pomocí:cd extensions/
    git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica
  • Při instalaci z Gitu spusťte Composer pro instalaci závislostí PHP zadáním composer install --no-dev v adresáři rozšíření. (Vyskytnou-li se nějaké komplikace, podívejte se na úkol T173141.)
  • Na konec vašeho souboru LocalSettings.php přidejte následující kód:
    wfLoadExtension( 'Elastica' );
    
  • Yes Dokončeno – Přejděte na stránku Special:Version vaší wiki a zkontrolujte, zda bylo rozšíření úspěšně nainstalováno.

CirrusSearch

  • Stáhněte soubor/y a vložte je do adresáře pojmenovaného CirrusSearch ve vaší složce extensions/.
    Vývojáři a přispěvatelé kódu by si místo toho měli nainstalovat rozšíření from Git pomocí:cd extensions/
    git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch
  • Při instalaci z Gitu spusťte Composer pro instalaci závislostí PHP zadáním composer install --no-dev v adresáři rozšíření. (Vyskytnou-li se nějaké komplikace, podívejte se na úkol T173141.)
  • Na konec vašeho souboru LocalSettings.php přidejte následující kód:
    wfLoadExtension( 'CirrusSearch' );
    
  • Now follow the setup instructions in the CirrusSearch README delivered with your extension i.e. $IP/extensions/CirrusSearch/README. Note that all info in it might not apply to your version of the extension, especially the version of Elasticsearch supported.
  • Configure as required.
  • Yes Dokončeno – Přejděte na stránku Special:Version vaší wiki a zkontrolujte, zda bylo rozšíření úspěšně nainstalováno.

Enable regex queries

This is an optional step. You will need to install the search-extra plugin for this. Do so by following these steps:

  1. execute the following command:
    /usr/share/elasticsearch/bin/elasticsearch-plugin/elasticsearch-plugin install org.wikimedia.search:extra:7.10.2-wmf12
    
  2. add the following line to you "LocalSettings.php" file:
    $wgCirrusSearchWikimediaExtraPlugin[ 'regex' ] = [ 'build', 'use', 'max_inspect' => 10000 ];
    
  3. restart Elasticsearch with the follwing command:
    systemctl restart elasticsearch
    
  4. recreate the search index by executing the following commands:
    1. php path/to/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --startOver
      
    2. php path/to/extensions/CirrusSearch/maintenance/ForceSearchIndex.php
      


Upgrading

Please follow the upgrade instructions in the CirrusSearch UPGRADE file.

Configuration

The configuration parameters of CirrusSearch are documented at the "settings.txt" file. See also documentation on CirrusSearch configuration profiles.

Elasticsearch will fail to index for CirrusSearch if one uses a database name for MySQL containing a capital character, e.g., "MyWikiDatabaseName." To mitigate this, CirrusSearch provides the $wgCirrusSearchIndexBaseName configuration parameter, which one needs to set, e.g., $wgCirrusSearchIndexBaseName = 'mywikidatabasename';.

Hooks

CirrusSearch extension defines a number of hooks that other extensions can make use of to extend the core schema and modify documents. The following hooks are available:

API

CirrusSearch features can be used in API queries. Searching happens via the normal search API, action=query&list=search; you can use CirrusSearch-specific features, such as the morelike: special prefix to find pages related to Marie Curie and radium:

api.php?action=query&list=search&srsearch=morelike:Marie_Curie%7Cradium&srlimit=10&srprop=size&formatversion=2

Custom APIs and parameters are provided for querying CirrusSearch configuration and debug information:

See also

General links
Debugging

Local development

Elastic Search service can be run with the Vagrant role (cirrussearch) and MediaWiki Vagrant.

For Docker, you can use a command like docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:6.8.2. Then follow the installation and configuration directions. If your web host is in a container you'll want to make sure the above container is on the same network, and in LocalSettings.php you will want to reference elasticsearch as the host name. This will not have the WMF plugins but can be sufficient for basic testing.