Flow/Architecture/Search
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. |
There are 3 big parts in making search work:
- Manage ES config: this is about getting some ElasticSearch configuration right (e.g. how to interpret datatypes: stem words, highlighter config, ...) and managing the ES indices (validate, reindex, ...)
- Index & search Flow data: self-explanatory, indexes Flow data in Elasticsearch & makes it searchable
- Search front-end: how we'll present the search functionality to users.
The last is mostly blocked on nailing the mockups. Once we're happy with that, we can start building it.
Manage ES config
editPatch: https://gerrit.wikimedia.org/r/#/c/161251/
Make CirrusSearch updateOneSearchIndexConfig.php reusable
edit- Status: Done
- Phabricator: https://phabricator.wikimedia.org/T78786
There's been a bunch of refactoring in CirrusSearch so that we can reuse most of its code in Flow. For a list of those patches, see the Phabricator task.
Make ES configuration management maintenance script
edit- Status: Done
- Phabricator: https://phabricator.wikimedia.org/T78787
How to use (1-4 will be done by enabling 'cirrussearch' role in MediaWiki-Vagrant). We should probably include this all in MediaWiki-Vagrant, either by default as part of Flow or as an optional role (flow-search?)
- Install ElasticSearch, version >=1.4 (if your MediaWiki-Vagrant doesn't yet have it, see update instructions in Matt's comment on PS12 here: https://gerrit.wikimedia.org/r/#/c/184404/)
- Install Extension:Elastica
- Install Extension:CirrusSearch
- Configure connection to ES (if different from the default 'localhost'):
$wgFlowSearchServers = array( 'searchserver' );
- Flow & ES should now be in touch
- In CLI, run:
php maintenance/FlowSearchConfig.php
: this will prepare the search index. If you are using MediaWiki-Vagrant, you need to usevagrant ssh
go to the/vagrant/mediawiki/extensions/Flow
folder and run the script within the shell. - (You could add any of the many options to that script, if you're looking to try out a particular piece)
- Should you, for some reason, need to quickly rebuild your index from scratch, kill it with
curl -XDELETE http://localhost:9200/\*_flow\*
(adjust the url as needed) and re-run these steps
Figure out how to deploy Flow search
edit- Status: To do
- Phabricator: https://phabricator.wikimedia.org/T78796
Index & search Flow data
editPatch: https://gerrit.wikimedia.org/r/#/c/126996/
Index Flow data in ES
edit- Status: Status: Done
- Phabricator: https://phabricator.wikimedia.org/T78788
How to use
You should look at #Make_ES_configuration_management_maintenance_script, which has more detailed instructions to also properly configure the search index.
- Do steps from #Make_ES_configuration_management_maintenance_script
- In CLI, run:
php maintenance/FlowFixWorkflowLastUpdateTimestamp.php
(to ensure workflow_last_update_timestamps are correct; may not be needed) - In CLI, run:
php maintenance/FlowForceSearchIndex.php
- Flow data should be indexing, hopefully
Search indexed Flow data
edit- Status: Status: Done
- Phabricator: https://phabricator.wikimedia.org/T78789
How to use
- See below, API endpoint is in place already ;)
Search API endpoint
edit- Status: Partially done
- Phabricator: https://phabricator.wikimedia.org/T78791
How to use
- Do steps from #Index_Flow_data_in_ES
- Set
$wgFlowSearchEnabled = true;
- Add
'script.disable_dynamic: false'
to your elasticsearch.yml (we're adding dynamic code to figure out the total amount of matching terms) - Do an API call, e.g.:
http://mediawiki.dev/api.php?page=Main_Page&action=flow&submodule=search&qterm=test
- See search results!
Search front-end
edit- Status: To do
- Phabricator: https://phabricator.wikimedia.org/T78790
For mockups, see Phabricator task.
There is a patch with a very barebones GUI - it's linked to in the Phabricator task.