User:Stefahn/Solr Docu

My own docu about Solr and SolrStore.

General

edit

Indexing and updating

edit
  • You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP.
  • You can modify a Solr index by POSTing XML Documents containing instructions to add (or update) documents, delete documents, commit pending adds and deletes, and optimize your index.
  • schema.xml can specify a "uniqueKey" field called "id". Whenever you POST instructions to Solr to add a document with the same value for the uniqueKey as an existing document, it automatically replaces it for you.
  • index changes are not visible until changes are committed and a new searcher is opened.
  • Commit can be an expensive operation so it's best to make many changes to an index in a batch and then send the commit command at the end.

Query

edit

Installation

edit

Restarting Solr

edit

Do the following as root (or sudo):

cd /opt
./tomcat/bin/shutdown.sh
./tomcat/bin/startup.sh

Command "shutdown" turns off the whole server!

schema.xml

edit
  • Info: http://wiki.apache.org/solr/SchemaXml
  • located in:
    • SolrStore: solr/core0/conf/
    • Solr example: solr/example/solr/conf
  • Defines the field types and fields of documents.
  • The schema defines the fields in the index and what type of analysis (field types) is applied to them.
    Example:
    <field name="subject" type="text_general" indexed="true" stored="true"/>
    "subject" = field, "text_general" = fieldtype / analyzer that is applied to the field called "subject"
  • The current schema your server is using may be accessed via the [SCHEMA] link on the admin page.
  • Attention: comment within comment leads to error

Tips and tricks

edit
  • If you want to sort an attribute with values like "1 - rookie", "2 - advanced", "3 - expert" don't chose "text_general" as field type, but "string" for example. If you chose text_general results are sorted in this way: advanced, expert, rookie (because "1 -" is skipped/tokenized somehow).

SolrStore

edit
  • You don't need to define the SMW attributes as fields in your schema.xml. You only need to define fields if you want to do one of the following:
    • You want to sort results by a attribute.
    • You want to have a search input that searches in more than one attribute (for example search in wikitext and pagetitle at the same time).

multivalued

edit
  • multiValued = this field may contain multiple values per document, i.e. if it can appear multiple times in a document
  • With SolrStore one can sort by every field of the Solr System - only requirement: the field must not be multivalued. Usually all the fields that SolrStore generates out of the wiki are multivalued.

Trick to use multivalued fields for sorting: use Copy_Fields to copy the content of one or several fields into another field that is not multivalued.

Changing and reindexing

edit

When you change the schema.xml you have not only to restart solr, but also to rebuild the index.

Way to go:

  1. Stop your application server
  2. Change your schema.xml file
  3. Delete the index directory in your data directory (Stefan: in the core directory)
  4. Start your application server (Solr will detect that there is no existing index and make a new one)
  5. Re-Index your data

Ways to reindex:

  • For SMW: Use the following two commands on a shell:
php SMW_refreshData.php -ftpv
php SMW_refreshData.php -v
See [1] for more info.

Misc:

  • There seems to be no problem if one quits XAMPP - data is still there the next time when one launches XAMPP again (reason: it's saved)
  • In general, you need to be very careful when you change the schema without reindexing - see [2]
  • Alternative to stopping application server: use multi-core - see [3]

Misc

edit

Multicore

edit
  • Multicore means one has more than one Solr core
  • Purpose: you can have a single Solr instance with separate configurations and indexes - while having the convenience of unified administration. More info: http://wiki.apache.org/solr/CoreAdmin
  • Cores are defined in solr.xml
edit