User:Stefahn/Solr Docu
My own docu about Solr and SolrStore.
General
editIndexing and updating
edit- You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP.
- You can modify a Solr index by POSTing XML Documents containing instructions to add (or update) documents, delete documents, commit pending adds and deletes, and optimize your index.
- schema.xml can specify a "uniqueKey" field called "id". Whenever you POST instructions to Solr to add a document with the same value for the uniqueKey as an existing document, it automatically replaces it for you.
- index changes are not visible until changes are committed and a new searcher is opened.
- Commit can be an expensive operation so it's best to make many changes to an index in a batch and then send the commit command at the end.
Query
edit- You query it via HTTP GET and receive XML, JSON, CSV or binary results.
- Basics: http://lucene.apache.org/solr/api-3_6_2/doc-files/tutorial.html#Querying+Data
- Test and debug queries within your Solr: http://localhost:8080/solr/core0/admin/form.jsp
- Example search UI: http://localhost:8983/solr/browse
- http://wiki.apache.org/solr/SolrQuerySyntax
Installation
edit- Extension:SolrStore/Install_Solr#Install_Apache_Solr_under_Windows
- http://www.icuriousmedia.com/blog/how-to-install-apache-solr-on-windows-xp-1439.php
- The folder solr in tomcat/webapps is generated automatically. One doesn't need to copy it from other locations.
Restarting Solr
editDo the following as root (or sudo):
cd /opt ./tomcat/bin/shutdown.sh ./tomcat/bin/startup.sh
Command "shutdown" turns off the whole server!
schema.xml
edit- Info: http://wiki.apache.org/solr/SchemaXml
- located in:
- SolrStore: solr/core0/conf/
- Solr example: solr/example/solr/conf
- Defines the field types and fields of documents.
- The schema defines the fields in the index and what type of analysis (field types) is applied to them.
Example:<field name="subject" type="text_general" indexed="true" stored="true"/>
"subject" = field, "text_general" = fieldtype / analyzer that is applied to the field called "subject" - The current schema your server is using may be accessed via the [SCHEMA] link on the admin page.
- Attention: comment within comment leads to error
Tips and tricks
edit- If you want to sort an attribute with values like "1 - rookie", "2 - advanced", "3 - expert" don't chose "text_general" as field type, but "string" for example. If you chose text_general results are sorted in this way: advanced, expert, rookie (because "1 -" is skipped/tokenized somehow).
SolrStore
edit- You don't need to define the SMW attributes as fields in your schema.xml. You only need to define fields if you want to do one of the following:
- You want to sort results by a attribute.
- You want to have a search input that searches in more than one attribute (for example search in wikitext and pagetitle at the same time).
multivalued
edit- multiValued = this field may contain multiple values per document, i.e. if it can appear multiple times in a document
- With SolrStore one can sort by every field of the Solr System - only requirement: the field must not be multivalued. Usually all the fields that SolrStore generates out of the wiki are multivalued.
Trick to use multivalued fields for sorting: use Copy_Fields to copy the content of one or several fields into another field that is not multivalued.
Changing and reindexing
editWhen you change the schema.xml you have not only to restart solr, but also to rebuild the index.
Way to go:
- Stop your application server
- Change your schema.xml file
- Delete the index directory in your data directory (Stefan: in the core directory)
- Start your application server (Solr will detect that there is no existing index and make a new one)
- Re-Index your data
Ways to reindex:
- For SMW: Use the following two commands on a shell:
php SMW_refreshData.php -ftpv php SMW_refreshData.php -v
- See [1] for more info.
- Script (I don't know how to use up2now, Simon: doesn't work with SMW): http://www.jason-palmer.com/2011/05/how-to-reindex-a-solr-database/
- Modify articles and save afterwards
Misc:
Misc
editMulticore
edit- Multicore means one has more than one Solr core
- Purpose: you can have a single Solr instance with separate configurations and indexes - while having the convenience of unified administration. More info: http://wiki.apache.org/solr/CoreAdmin
- Cores are defined in solr.xml