Extension:WikiSearch/api

With WikiSearchFront installed, you don't have to write an API request for your search. You can simply use the parser function #WikiSearchFrontend on the page that contains #WikiSearchConfig which will automatically create a search engine.

Performing a search requires:

  • A search configuration page on your wiki containing a search configuration created by #WikiSearchConfig;
  • An API request performing the actual search, pointing to the page ID of your page containing the configuration.

Configuration page

edit

See Extension:WikiSearch/usage for this documentation.

edit

Performs a search and returns the list of search results. If the API is in debug mode, this endpoint also returns the raw ElasticSearch query that was used to perform the search.

Parameters

edit
Parameter Type Description
pageid integer The MediaWiki page ID of the page from which the search configuration should be retrieved. Needs to be a valid page ID of a page containing a configuration.
term string The search term query to use for the main free-text search. This corresponds to the main search field on a search page. Defaults to the empty string. When no term is given, all results are returned.
from integer The cursor to use for pagination. from specifies the offset of the results to return. Defaults to 0.
limit integer The limit on the number of results to return (inclusive). Defaults to 10.
filter list The filters to apply to the search. Defaults to the empty list. See below for additional information about the syntax.
aggregations list The aggregations to generate from the search. Defaults to the empty list. See below for additional information and how to specify the aggregations.
sorting list The sortings to apply to the search. Defaults to the empty list. See below for additional information about and how to specify the sortings.

Example request

edit

JavaScript

edit
var params = {
  action: 'query',
  format: 'json',
  meta: 'WikiSearch',
  filter: [{"value":"5","key":"Average rating","range":{"gte":5,"lte":6}}],
  from: '0',
  limit: '10',
  pageid: '698',
  aggregations: [{
    "type": "range",
    "ranges": [
      {
        "from": 1,
        "to": 6,
        "key": "1"
      },
      {
        "from": 2,
        "to": 6,
        "key": "2"
      },
      {
        "from": 3,
        "to": 6,
        "key": "3"
      },
      {
        "from": 4,
        "to": 6,
        "key": "4"
      },
      {
        "from": 5,
        "to": 6,
        "key": "5"
      }
    ],
    "property": "Average rating"
  }]
}

api = new mw.Api();
api.post(params).done(function(data) {
  console.log(data);
});

cURL

edit
curl https://wiki.example.org/api.php \
-d action=query \
-d format=json \
-d meta=WikiSearch \
-d filter=[{"value":"5","key":"Average rating","range":{"gte":5,"lte":6}}] \
-d from=0 \
-d limit=10 \
-d pageid=698 \
-d aggregations=[
    {"type":"range","ranges":[
        {"from":1,"to":6,"key":"1"},
        {"from":2,"to":6,"key":"2"},
        {"from":3,"to":6,"key":"3"},
        {"from":4,"to":6,"key":"4"},
        {"from":5,"to":6,"key":"5"}
    ],"property":"Average rating"}
]

Example response

edit
{
    "batchcomplete": "",
    "result": {
        "hits": "[<TRUNCATED, SEE BELOW FOR PARSING>]",
        "total": 1,
        "aggs": {
            "Average rating": {
                "meta": [],
                "doc_count": 1,
                "Average rating": {
                    "buckets": {
                        "1": {
                            "from": 1,
                            "to": 6,
                            "doc_count": 1
                        },
                        "2": {
                            "from": 2,
                            "to": 6,
                            "doc_count": 1
                        },
                        "3": {
                            "from": 3,
                            "to": 6,
                            "doc_count": 1
                        },
                        "4": {
                            "from": 4,
                            "to": 6,
                            "doc_count": 1
                        },
                        "5": {
                            "from": 5,
                            "to": 6,
                            "doc_count": 1
                        }
                    }
                }
            }
        }
    }
}

Parsing the response

edit

This section assumes you have successfully made a request to the API using PHP and have stored the raw API result in the variable $response.

The $response object is a JSON encoded string, and needs to be decoded before it can be used:

$response = json_decode($response, true);

After having decoded the $response object, the response usually contains two keys (three if debug mode is enabled):

Field Type Description
batchcomplete string Added by MediaWiki and not relevant for API users.
result object Contains the result object of the performed search.
query object The raw ElasticSearch query used to perform this search. This field is only available when debug mode is enabled.

Generally, we are only interested in the API result object, so we can create a new variable only containing that field:

$result = $response["result"];

This $result field will look something like this:

{
    "hits": "[<TRUNCATED, SEE BELOW FOR PARSING>]",
    "total": 1,
    "aggs": {
        "Average rating": {
            "meta": [],
            "doc_count": 1,
            "Average rating": {
                "buckets": {
                    "1": {
                        "from": 1,
                        "to": 6,
                        "doc_count": 1
                    },
                    "2": {
                        "from": 2,
                        "to": 6,
                        "doc_count": 1
                    },
                    "3": {
                        "from": 3,
                        "to": 6,
                        "doc_count": 1
                    },
                    "4": {
                        "from": 4,
                        "to": 6,
                        "doc_count": 1
                    },
                    "5": {
                        "from": 5,
                        "to": 6,
                        "doc_count": 1
                    }
                }
            }
        }
    }
}

The hits field

edit

The hits field contains a JSON-encoded string of the ElasticSearch search results. This field needs to be decoded using json_decode before it can be used. The field directly corresponds to the hits.hits field from the ElasticSearch response. See the ElasticSearch documentation for very detailed documentation about what this field looks like.

To get the associated page name of any search result, the subject.namespacename and subject.title hit-field in the hits field may be concatenated using a colon, like so:

$hits = json_decode($result["hits"], true);

foreach ($hits as $hit) {
    $namespace_name = $hit["subject"]["namespacename"];
    $page_title = $hit["subject"]["title"];

    $page_name = sprintf("%s:%s", $namespace_name, $page_title);

    echo $page_name;
}

The subject.namespacename hit-field contains the name of the namespace in which the search result lives, and the subject.title hit-field contains the name of the page that matched the search (without a namespace prefix). To get the full URL for this page, you can prepend http://<wikiurl>/index.php/ to the page name.

The hits field also contains the generated highlighted snippets, if they are available. These can be accessed through the highlight hit-field, like so:

$hits = json_decode($result["hits"], true);

foreach ($hits as $hit) {
    $highlights = $hit["highlight"];
    
    foreach ($highlights as $highlight) {
        // $highlight is an array of highlighted snippets

        $highlight_string = implode("", $highlight);
    
        echo $highlight_string;
    }
}

See also the ElasticSearch Highlighting documentation.

The aggs field

edit

The aggs field directly corresponds to the aggregations field from the ElasticSearch response. See the ElasticSearch documentation for further details.

The total field

edit

The total field contains the total number of results found by ElasticSearch. This field is not influenced by the limit and always displays the total number of results available, regardless of how many were actually returned.

Filters syntax

edit

The filter parameter takes a list of objects. These objects have the following form:

PropertyRangeFilter

edit

This filter only returns pages that have the specified property with a value in the specified range.

{
    "key": "Age",
    "range": {
        "gte": 0,
        "lt": 100
    }
}

The above filter only includes pages where property Age has a value that is greater than or equal to 0, but strictly less than 100.

The range parameter takes an object that allows the following properties:

  • gte: Greater-than or equal to
  • gt: Strictly greater-than
  • lte: Less-than or equal to
  • lt: Strictly less-than

PropertyValueFilter

edit

This filter only returns pages that have the specified property with the specified value.

{
    "key": "Class",
    "value": "Manual"
}

The above filter only includes pages where the property Class has the value Manual. The value may be any of the following data types:

  • string
  • boolean
  • integer
  • float
  • double

PropertyValuesFilter

edit

This filter only returns pages that have the specified property with any of the specified values.

{
    "key": "Class",
    "value": ["foo", "bar"]
}

The above filter only includes pages where the property Class has the value foo or bar.

HasPropertyFilter

edit

This filter only returns pages that have the specified property with any value.

{
    "key": "Class",
    "value": "+"
}

The above filter only includes pages that have the property Class. It does not take the value of the property into account.

PropertyTextFilter

edit

This filter only returns pages that have the specified property with a value that matches the given search query string.

{
    "key": "Class",
    "value": "Foo | (Bar + Quz)",
    "type": "query"
}

The above filter executes the given query and only includes pages that matched the executed query. The query syntax is identical to the simple query syntax used by ElasticSearch.

Aggregations syntax

edit

The aggregations parameter takes a list of objects. These objects have the following form:

PropertyRangeAggregation

edit
{
    "type": "range",
    "ranges": [
        { "to": 50 },
        { "from": 50, "to": 100 },
        { "from": 100 }
    ],
    "property": "Price",
    "name": "Prices"
}
The from parameter is inclusive, and the to parameter is exclusive. This means that for an aggregation from (and including) 1 up to and including 5, the from and to parameters should be 1 and 6 (!) respectively.

PropertyAggregation

edit
{
    "type": "property",
    "property": "Genre",
    "name": "Genres"
}

Sortings syntax

edit

The sortings parameter takes a list of objects. These objects have the following form:

PropertySort

edit
{
    "type": "property",
    "property": "Genre",
    "order": "asc"
}

The above filter sorts the results based on the value of the property Genre in an ascending order. It is also possible to sort in a descending order.

Sorting on a property that does not exist will result in an error.

Highlight API

edit

The highlight API has the following properties:

  • query: The query to generate highlighted terms from
  • properties: The properties over which the highlights need to be calculated
  • page_id: The page ID of the page on which the highlights need to be calculated
  • limit: The number of highlighted terms to calculate; this does not always correspond directly with the number of terms returned, since duplicates are removed after the query to ElasticSearch
  • size: The (approximate) size of snippets to generate, leave blank to highlight individual words