Flow/Architecture/Search

There are 3 big parts in making search work:

  • Manage ES config: this is about getting some ElasticSearch configuration right (e.g. how to interpret datatypes: stem words, highlighter config, ...) and managing the ES indices (validate, reindex, ...)
  • Index & search Flow data: self-explanatory, indexes Flow data in Elasticsearch & makes it searchable
  • Search front-end: how we'll present the search functionality to users.

The last is mostly blocked on nailing the mockups. Once we're happy with that, we can start building it.

Manage ES config

edit

Patch: https://gerrit.wikimedia.org/r/#/c/161251/

Make CirrusSearch updateOneSearchIndexConfig.php reusable

edit

There's been a bunch of refactoring in CirrusSearch so that we can reuse most of its code in Flow. For a list of those patches, see the Phabricator task.

Make ES configuration management maintenance script

edit

How to use (1-4 will be done by enabling 'cirrussearch' role in MediaWiki-Vagrant). We should probably include this all in MediaWiki-Vagrant, either by default as part of Flow or as an optional role (flow-search?)

  1. Install ElasticSearch, version >=1.4 (if your MediaWiki-Vagrant doesn't yet have it, see update instructions in Matt's comment on PS12 here: https://gerrit.wikimedia.org/r/#/c/184404/)
  2. Install Extension:Elastica
  3. Install Extension:CirrusSearch
  4. Configure connection to ES (if different from the default 'localhost'): $wgFlowSearchServers = array( 'searchserver' );
  5. Flow & ES should now be in touch
  6. In CLI, run: php maintenance/FlowSearchConfig.php: this will prepare the search index. If you are using MediaWiki-Vagrant, you need to use vagrant ssh go to the /vagrant/mediawiki/extensions/Flow folder and run the script within the shell.
  7. (You could add any of the many options to that script, if you're looking to try out a particular piece)
  8. Should you, for some reason, need to quickly rebuild your index from scratch, kill it with curl -XDELETE http://localhost:9200/\*_flow\* (adjust the url as needed) and re-run these steps
edit

Index & search Flow data

edit

Patch: https://gerrit.wikimedia.org/r/#/c/126996/

Index Flow data in ES

edit

How to use

You should look at #Make_ES_configuration_management_maintenance_script, which has more detailed instructions to also properly configure the search index.

  1. Do steps from #Make_ES_configuration_management_maintenance_script
  2. In CLI, run: php maintenance/FlowFixWorkflowLastUpdateTimestamp.php (to ensure workflow_last_update_timestamps are correct; may not be needed)
  3. In CLI, run: php maintenance/FlowForceSearchIndex.php
  4. Flow data should be indexing, hopefully

Search indexed Flow data

edit

How to use

  1. See below, API endpoint is in place already ;)

Search API endpoint

edit

How to use

  1. Do steps from #Index_Flow_data_in_ES
  2. Set $wgFlowSearchEnabled = true;
  3. Add 'script.disable_dynamic: false' to your elasticsearch.yml (we're adding dynamic code to figure out the total amount of matching terms)
  4. Do an API call, e.g.: http://mediawiki.dev/api.php?page=Main_Page&action=flow&submodule=search&qterm=test
  5. See search results!

Search front-end

edit

For mockups, see Phabricator task.

There is a patch with a very barebones GUI - it's linked to in the Phabricator task.