Extension:CirrusSearch/Profiles
CirrusSearch has a lot of tunable parameters that influence various aspects of the indexing, such as search rankings, indexing, etc. These parameters are organized in data sets called "profiles", which are named sets of data defining the settings for a given profile type and context. Each profile type and context has a default profile name, which can be overridden by setting config variables, URL parameters or user settings.
Profile type
editProfile type is a kind of data that is used for configuration or tuning - such as rescore configuration, similarity configuration, ranking functions set, etc. Different profile types contain different data and usually are not compatible with each other. The following profile types are defined in the code:
COMPLETION
- Defines settings for the completion suggester.
- Defaults in file[1]:
profiles/SuggestProfiles.config.php
- Configuration variable[2]:
$wgCirrusSearchCompletionProfiles
CROSS_PROJECT_BLOCK_SCORER
- Defines settings for merging results from cross-wiki searches.
- Defaults in file[1]:
profiles/CrossProjectBlockScorerProfiles.config.php
- Configuration variable[2]:
$wgCirrusSearchCrossProjectBlockScorerProfiles
FT_QUERY_BUILDER
- Defines settings for building the elasticsearch query during fulltext searches
- Default in file[1]:
profiles/FullTextQueryBuilderProfiles.config.php
- Configuration variable[2]:
$wgCirrusSearchFullTextQueryBuilderProfiles
PHRASE_SUGGESTER
- Defines settings for building the elasticsearch phrase suggest query (did you mean? suggestions)
- Default in file[1]:
profiles/PhraseSuggesterProfiles.config.php
- Configuration variable[2]:
$wgCirrusSearchPhraseSuggestProfiles
RESCORE
- Defines configuration for ranking search results.
- Defaults in file[1]:
profiles/RescoreProfiles.config.php
- Configuration variable[2]:
$wgCirrusSearchRescoreProfiles
RESCORE_FUNCTION_CHAINS
- Defines functional expressions to be used in scoring search results.
- Defaults in file[1]:
profiles/RescoreFunctionChains.config.php
- Configuration variable[2]:
$wgCirrusSearchRescoreFunctionScoreChains
SANEITIZER
- Defines settings for the sanitization process running in the background (check for missing updates)
- Defaults in file[1]:
profiles/SaneitizeProfiles.config.php
SIMILARITY
- Defines similarity configurations
- Defaults in file[1]:
profiles/SimilarityProfiles.config.php
- Configuration variable[2]:
$wgCirrusSearchSimilarityProfiles
Note that profiles defined in both default files and config settings, and other repositories, should have unique names across the type. Extensions can define their own profile types and add profiles to the list of available profiles of existing types, either through using the variables above or by defining their own profile repositories.
Wikibase
editWikibase extensions, such as WikibaseCirrusSearch has its own profiles defined, and also adds some profiles to the ones specified above:
RESCORE
- Added rescore profiles that are used for Wikibase entities.
- Defaults in file:
src/config/ElasticSearchRescoreProfiles.php
- Configuration variable[2]:
$wgWBCSRescoreProfiles
RESCORE_FUNCTION_CHAINS
- Added functional expressions to be used in scoring search results.
- Defaults in file[1]:
src/config/ElasticSearchRescoreFunctions.php
Wikibase types:
WIKIBASE_PREFIX_QUERY_BUILDER
- Configuration for wikibase query builder prefix search
- Defaults in file:
src/config/EntityPrefixSearchProfiles.php
- Configuration variable[2]:
$wgWBCSPrefixSearchProfiles
Context
editContext defines in which kind of environment a profile is being used - i.e., a rescore profile can be applied to regular search, prefix search, Wikibase search, etc., which may require different settings (though still the same type of settings, thus the same data structure). Context is secondary to profile type - the same profile type always uses the same data structure, but can use different profile names and thus different settings in different contexts.
The following contexts are defined out of the box:
CONTEXT_DEFAULT
— default context that is applied unless some other context is specifiedCONTEXT_PREFIXSEARCH
— used when prefix search is performedCONTEXT_WIKIBASE_PREFIX
— Wikibase prefix search (wbsearchentities
).
Profile selection
editThe profile to use for a specific operation is defined by the following procedure:
- Define the profile type and context in which we are operating (see above).
- For the profile type / context pair scan the set of override possibilities that are available - such as URI overrides, user preference overrides, config overrides, etc. in order of priority. Default priority is URI override on the top, then user preference, then config.
- If override is set, use that value as the profile name.
- Otherwise, use the default value for the profile / context pair.
- Fetch the profile with this name. If the profile with the overridden name does not exist, use the default profile (i.e., the profile with the default name).
The set of overrides is as follows:
Type | Context | Default[3] | URI Override[4] | User override | Config override |
---|---|---|---|---|---|
COMPLETION
|
CONTEXT_DEFAULT
|
fuzzy | cirrussearch-pref-completion-profile
|
$wgCirrusSearchCompletionSettings
| |
CROSS_PROJECT_BLOCK_SCORER
|
CONTEXT_DEFAULT
|
static | $wgCirrusSearchCrossProjectOrder
| ||
FT_QUERY_BUILDER
|
CONTEXT_DEFAULT
|
default | cirrusFTQBProfile
|
$wgCirrusSearchFullTextQueryBuilderProfile
| |
PHRASE_SUGGESTER
|
CONTEXT_DEFAULT
|
default | $wgCirrusSearchPhraseSuggestSettings
| ||
RESCORE
|
CONTEXT_DEFAULT
|
classic | fulltextQueryIndepProfile , cirrusRescoreProfile
|
$wgCirrusSearchRescoreProfile
| |
RESCORE
|
CONTEXT_PREFIXSEARCH
|
classic | cirrusRescoreProfile
|
$wgCirrusSearchPrefixSearchRescoreProfile
| |
RESCORE_FUNCTION_CHAINS | CONTEXT_DEFAULT | n/a[5] | |||
SANEITIZER
|
n/a[6] | ||||
SIMILARITY
|
CONTEXT_DEFAULT
|
default | $wgCirrusSearchSimilarityProfile
| ||
RESCORE
|
CONTEXT_WIKIBASE_PREFIX
|
wikibase_prefix | cirrusRescoreProfile
|
$wgWBCSDefaultPrefixRescoreProfile
| |
WIKIBASE_PREFIX_QUERY_BUILDER
|
CONTEXT_WIKIBASE_PREFIX
|
default | cirrusWBProfile
|
$wgWBCSDefaultPrefixProfile
|
Note that the same type can use the same override setting in different contexts, especially for URI override. This is a good practice for URI overrides, since they are per-request and thus used only in one context, but not a good practice for persistent overrides, like user or config overrides.
- ↑ 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 This file contains basic profiles of that type.
- ↑ 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 This configuration variable can contain additional profiles of this type.
- ↑ The name of the profile used by default.
- ↑ This is entered in the URI of the request, e.g.,
cirrusOverride=profileName
. - ↑ These profiles are always referenced explicitly by
RESCORE
data, so there is no default. - ↑ The sanitizer will choose the best profile to use at runtime based on wiki size