Extension talk:SolrStore

Latest comment: 9 years ago by Kghbln in topic Using 'Range' input type for a field


ERROR: multiple values encountered for non multiValued field

edit

While testing SolrStore 0.6 Beta (r114795), the system stopped with an fatal dump and since we couldn't find a SolrStore bugzilla component, we post our findings here.

Our hypothesis is that whenever the property Categories is assigned more than one category value [Pages with broken file links, Book] a dump such as below is created where in cases with no assigned value on the property Categories (category) no error dump was created.

The request sent by the client was syntactically incorrect 
(ERROR: [doc=Porter/1986/Competition in Global Industries] 
multiple values encountered for non multiValued field category: 
[Pages with broken file links, Book]).
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(209): SolrTalker->solrSend('http://192.168....', '<add><doc><fiel...')
#1 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(279): SolrTalker->solrAdd('<add><doc><fiel...')
#2 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(439): SolrTalker->addDoc(Object(SolrDoc))
#3 D:\xampp\htdocs\...\extensions\SolrStore\SolrConnectorStore.php(139): SolrTalker->parseSemanticData(Object(SMWSemanticData))
#4 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\storage\SMW_Store.php(303): SolrConnectorStore->doDataUpdate(Object(SMWSemanticData))
#5 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(316): SMWStore->updateData(Object(SMWSemanticData))
#6 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(445): SMWParseData::storeData(Object(ParserOutput), Object(Title), true)
#7 [internal function]: SMWParseData::onLinksUpdateConstructed(Object(LinksUpdate))
#8 D:\xampp\htdocs\...\includes\Hooks.php(216): call_user_func_array('SMWParseData::o...', Array)
#9 D:\xampp\htdocs\...\includes\GlobalFunctions.php(3631): Hooks::run('LinksUpdateCons...', Array)
#10 D:\xampp\htdocs\...\includes\LinksUpdate.php(98): wfRunHooks('LinksUpdateCons...', Array)
#11 D:\xampp\htdocs\...\includes\WikiPage.php(2021): LinksUpdate->__construct(Object(Title), Object(ParserOutput))
#12 D:\xampp\htdocs\...\includes\WikiPage.php(1200): WikiPage->doEditUpdates(Object(Revision), Object(User), Array)
#13 [internal function]: WikiPage->doEdit('{{Book?|title=C...', '', 98)
#14 D:\xampp\htdocs\...\includes\Article.php(1934): call_user_func_array(Array, Array)
#15 D:\xampp\htdocs\...\includes\EditPage.php(1214): Article->__call('doEdit', Array)
#16 D:\xampp\htdocs\...\includes\EditPage.php(1214): Article->doEdit('{{Book?|title=C...', '', 98)
#17 D:\xampp\htdocs\...\includes\EditPage.php(2855): EditPage->internalAttemptSave(Array, false)
#18 D:\xampp\htdocs\...\includes\EditPage.php(478): EditPage->attemptSave()
#19 D:\xampp\htdocs\...\includes\EditPage.php(353): EditPage->edit()
#20 D:\xampp\htdocs\...\includes\Wiki.php(501): EditPage->submit()
#21 D:\xampp\htdocs\...\includes\Wiki.php(241): MediaWiki->performAction(Object(Article))
#22 D:\xampp\htdocs\...\includes\Wiki.php(626): MediaWiki->performRequest()
#23 D:\xampp\htdocs\...\includes\Wiki.php(533): MediaWiki->main()
#24 D:\xampp\htdocs\...\index.php(57): MediaWiki->run()
#25 {main}

MWJames (talk) 21:06, 7 April 2012 (UTC)Reply

Thanks, for Reporting this Bug. I'll have a closer look at this error at Tuesday when I'm back at Work
This Error normally Occurs when Solr trys to insert values into a field that is not defined as multivalued=true.
You have to change in line 448 in your Solr schema.xml to
<field name="category" type="text" indexed="true" stored="true" multiValued="true"/> SBachenberg (talk) 21:24, 7 April 2012 (UTC)Reply
Fixed rev. 114820 Schuellersa (talk) 14:29, 10 April 2012 (UTC)Reply

Distorted Vector skin while using Special:SolrSearch

edit

A distorted vector skin was found to appear while testing SolrStore 0.6 Beta (r114795) and Special:SolrSearch. After the selection of SolrSearch: SearchSet select the whole vector skin / sidebar became repositioned and distorted while the reason might lay in some Special:SolrSearch fieldsets or div's not enclosed and responsible for the repositioning. MWJames (talk) 21:17, 7 April 2012 (UTC)Reply

Change In the File SolrSpecialSearch.php line 562 to:
$out .= '</table>';
SBachenberg (talk) 21:30, 7 April 2012 (UTC)Reply
fixed. rev. 114822 Schuellersa (talk) 14:36, 10 April 2012 (UTC)Reply
Not sure why but with r114866 <table> was introduced again, I had to change it back to </table>. MWJames (talk) 06:10, 13 April 2012 (UTC)Reply
HI James,
my College Schuellersa sometimes gets confused with his different versions of our Extension.
Its now the 3. time we have to re-fix this error :-( SBachenberg (talk) 06:43, 13 April 2012 (UTC)Reply

Error on attribute value & #13; and & lt;

edit

Thanks for the quick response and sorry this time we run into a problem involving & #13; and & lt;.

The request sent by the client was syntactically incorrect (Unexpected '<'  in attribute value
 at [row,col {unknown-source}]: [4,212]).
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(209): SolrTalker->so
lrSend('http://192.168....', '<add><doc><fiel...')
#1 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(279): SolrTalker->so
lrAdd('<add><doc><fiel...')
#2 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(439): SolrTalker->ad
dDoc(Object(SolrDoc))
#3 D:\xampp\htdocs\...\extensions\SolrStore\SolrConnectorStore.php(139): SolrTa
lker->parseSemanticData(Object(SMWSemanticData))
#4 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\storage\SMW_Store.
php(303): SolrConnectorStore->doDataUpdate(Object(SMWSemanticData))
#5 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(
316): SMWStore->updateData(Object(SMWSemanticData))
#6 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(
445): SMWParseData::storeData(Object(ParserOutput), Object(Title), true)
#7 [internal function]: SMWParseData::onLinksUpdateConstructed(Object(LinksUpdat
e))
#8 D:\xampp\htdocs\...\includes\Hooks.php(216): call_user_func_array('SMWParseD
ata::o...', Array)
#9 D:\xampp\htdocs\...\includes\GlobalFunctions.php(3631): Hooks::run('LinksUpd
ateCons...', Array)
#10 D:\xampp\htdocs\...\includes\LinksUpdate.php(98): wfRunHooks('LinksUpdateCo
ns...', Array)
#11 D:\xampp\htdocs\...\includes\job\RefreshLinksJob.php(119): LinksUpdate->__c
onstruct(Object(Title), Object(ParserOutput), false)
#12 D:\xampp\htdocs\...\maintenance\runJobs.php(78): RefreshLinksJob2->run()
#13 D:\xampp\htdocs\...\maintenance\doMaintenance.php(105): RunJobs->execute()
#14 D:\xampp\htdocs\...\maintenance\runJobs.php(108): require_once('D:\xampp\ht
docs...')
#15 {main}

MWJames (talk) 22:42, 7 April 2012 (UTC)Reply

Ok,
this looks like a bigger Problem.
We sent the Property values as XML to Solr, '<' and '>' would break the xml syntax.
Next Week we will try to fix that error. SBachenberg (talk) 22:50, 7 April 2012 (UTC)Reply
fixed rev. 114821 Schuellersa (talk) 14:32, 10 April 2012 (UTC)Reply

Error adding field

edit

Sorry to drag this but we just had another issue when for an article that have been stored before a particular property (in the case below File size is a special property) wasn't present but when saving the article again this property is annoted and SolrStore causes the following error.

 
The request sent by the client was syntactically incorrect 
(ERROR: [doc=827826dc-2fc4-4615-ba7f-b89ba1f14480.pdf] 
Error adding field 'File size_i'='4.699').
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(209): SolrTalker->solrSend('http://192.168....', '<add><doc><fiel...')
#1 D:\xampp\htdocs\aris\extensions\SolrStore\SolrTalker.php(279): SolrTalker->solrAdd('<add><doc><fiel...')
#2 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(439): SolrTalker->addDoc(Object(SolrDoc))
#3 D:\xampp\htdocs\...\extensions\SolrStore\SolrConnectorStore.php(139): SolrTalker->parseSemanticData(Object(SMWSemanticData))
#4 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\storage\SMW_Store.php(303): SolrConnectorStore->doDataUpdate(Object(SMWSemanticData))
#5 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(316): SMWStore->updateData(Object(SMWSemanticData))
#6 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(445): SMWParseData::storeData(Object(ParserOutput), Object(Title), true)

MWJames (talk) 23:40, 7 April 2012 (UTC)Reply

The property File size is maintained as Has type :: Quantity. MWJames (talk) 08:40, 8 April 2012 (UTC)Reply
HI,
this is another Schema.xml error.
We have defined all Semanic Fields of the Type Nummber as integer and you have a float value.
You have to change line 513 in your schema.xml from int to float.
<dynamicField name="*_i" type="float" indexed="true" stored="true" multiValued="true"/>
SBachenberg (talk) 08:50, 8 April 2012 (UTC)Reply
Thanks that did the trick and to complete the settings, one should change below as well.
<dynamicField name="*_imin"  type="float"    indexed="true"  />
<dynamicField name="*_imax"  type="float"    indexed="true"  />
MWJames (talk) 09:25, 8 April 2012 (UTC)Reply
Fixed rev. 114820 Schuellersa (talk) 14:30, 10 April 2012 (UTC)Reply

Unexpected XML tag doc/p

edit

During a runJob exercise another error occurred, the backtrace does not say which document caused the error nor which XML tag, anyway please find the backtrace below.

The request sent by the client was syntactically incorrect (unexpected XML tag doc/p).
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(209): SolrTalker->solrSend('http://192.168....', '<add><doc><fiel...')
#1 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(279): SolrTalker->solrAdd('<add><doc><fiel...')
#2 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(439): SolrTalker->addDoc(Object(SolrDoc))
#3 D:\xampp\htdocs\...\extensions\SolrStore\SolrConnectorStore.php(139): SolrTalker->parseSemanticData(Object(SMWSemanticData))

MWJames (talk) 07:05, 8 April 2012 (UTC)Reply

This looks like the same Error as Error on attribute value & #13; and & lt;
There seems to be a

Tag or something in your Value you. I have to Fix that at Tuesday, when I'm back at Work. SBachenberg (talk) 08:54, 8 April 2012 (UTC)Reply
If you want to fix it your self, have a look in the SolrDoc.php
In Line 26 is the function addField( $name, $value ), you have to add some String Replaces for the Field Value and remove '<' and '>', this should Probably Fix the errors.
But i have to build a better solution, for cleaning the Values. SBachenberg (talk) 09:10, 8 April 2012 (UTC)Reply
As for testing purpose, I just did a quick hack where at least the runJob doesn't break any more.
$value = preg_replace('/<|>/msu', '',$value);
MWJames (talk) 09:51, 8 April 2012 (UTC)Reply
fixed. rev. 114821 Schuellersa (talk) 14:34, 10 April 2012 (UTC)Reply
Instead of using the preg_replace, I now use MW's own XML sanitizer (Sanitizer::normalizeCharReferences) which should make any name/value XML conform.
$this->output .= '<field name="' .  Sanitizer::normalizeCharReferences ( $name ) . '">' . Sanitizer::normalizeCharReferences ( $value ) . '</field>';
MWJames (talk) 15:08, 10 April 2012 (UTC)Reply
This looks really cool. I didnt know that MW has its own Sanitizer, Thank you!
We will add this. SBachenberg (talk) 16:11, 10 April 2012 (UTC)Reply
Having said this, everything should be covered but somehow Solr still comes back with an error which means their must be another area where some misleading XML tags create a crash.
But I have a hypothesis that when a property for example Abstract (has type::text) not only contains text but also a notion of a template ({{value| ...}}) a crash dump is created while trying to save the article. Because when changing {{value| ...}} to [[value:: ...]] in the property value text the same article saves without any trouble. MWJames (talk) 22:46, 10 April 2012 (UTC)Reply
re-fixed rev. 114841 Schuellersa (talk) 07:53, 11 April 2012 (UTC)Reply
The Best thing would be to remove all HTML Tags completely before sending them to Solr. I think nobody wants to query html tags, so you dont need them in your Solr index.
Could you try this piece of code for me ?
26 	public function addField( $name, $value ) {
27 	$this->output .= '<field name="' . strip_tags ( $name ) . '">' . strip_tags ( $value ) . '</field>';
28 	}
SBachenberg (talk) 10:12, 12 April 2012 (UTC)Reply
For the above cited case of {{ }} within property values the above change didn't bring any success, it still runs into a back trace. Could their be another area where XML fragments are created? MWJames (talk) 06:06, 13 April 2012 (UTC)Reply
HI James,
this is the only place where we create XML before we sent it to Solr.
Maybe you could sent me another backtrace ?
Your Normal way to parse the SMW-Data is:
  1. Read the Attributes and values
  2. add them to a SolrDoc
  3. sent the SolrDoc with the SolrTalker to Solr
  4. Done SBachenberg (talk) 06:48, 13 April 2012 (UTC)Reply
As I said before the backtrace is really ambiguous therefore one can't really tell where, when, and how things are happening.
I also tried to log ($wgDebugLogFile) any other possible messages but the log file does not show any related information to the above problem.
Maybe some wfDebugLog( 'SolrStore', __METHOD__,... log messages could help to shed light on where it comes to problems. Using the this message type would allow to filter all related Solr message using $wgDebugLogGroups = ...
Anyway the last backtrace based on SVN r114866.
The request sent by the client was syntactically incorrect (unexpected XML tag d
oc/span).
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(211): SolrTalker->so
lrSend('http://192.168....', '<add><doc><fiel...')
#1 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(281): SolrTalker->so
lrAdd('<add><doc><fiel...')
#2 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(441): SolrTalker->ad
dDoc(Object(SolrDoc))
#3 D:\xampp\htdocs\...\extensions\SolrStore\SolrConnectorStore.php(139): SolrTa
lker->parseSemanticData(Object(SMWSemanticData))
#4 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\storage\SMW_Store.
php(303): SolrConnectorStore->doDataUpdate(Object(SMWSemanticData))
#5 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(
316): SMWStore->updateData(Object(SMWSemanticData))
#6 D:\xampp\htdocs\...\extensions\SemanticMediaWiki\includes\SMW_ParseData.php(
445): SMWParseData::storeData(Object(ParserOutput), Object(Title), true)
#7 [internal function]: SMWParseData::onLinksUpdateConstructed(Object(LinksUpdat
e))
#8 D:\xampp\htdocs\...\includes\Hooks.php(216): call_user_func_array('SMWParseD
ata::o...', Array)
#9 D:\xampp\htdocs\...\includes\GlobalFunctions.php(3631): Hooks::run('LinksUpd
ateCons...', Array)
#10 D:\xampp\htdocs\...\includes\LinksUpdate.php(98): wfRunHooks('LinksUpdateCo
ns...', Array)
#11 D:\xampp\htdocs\...\includes\job\RefreshLinksJob.php(49): LinksUpdate->__co
nstruct(Object(Title), Object(ParserOutput), false)
#12 D:\xampp\htdocs\...\maintenance\runJobs.php(78): RefreshLinksJob->run()
#13 D:\xampp\htdocs\...\maintenance\doMaintenance.php(105): RunJobs->execute()
#14 D:\xampp\htdocs\...\maintenance\runJobs.php(108): require_once('D:\xampp\ht
docs...')
#15 {main}
MWJames (talk) 08:44, 13 April 2012 (UTC)Reply
Hi James,
could you sent me the source code of one of your pages, that makes Problems. My mail is simon.bachenberg(at)gmail.com
I created a new mediawiki now to create this error and i need some good data for it ;-)
For testing purposes you can add $wgSolrDebug = true; to your localsettings.php to see everything that gets sent to solr. SBachenberg (talk) 07:40, 20 April 2012 (UTC)Reply
Hi James,
we should have fixed this error now in the newest SVN version, could you please test it. SBachenberg (talk) 09:48, 20 April 2012 (UTC)Reply

Article full text vs. property attribution text

edit

It might seem like a dull question but I'll try to elaborate on it anyway. While SolrStore is quick to index all property related fields (numeric, blob etc), I was wondering how to customize SolrStore so it would index the article text as well? Because when I have looked at the Solr index files, I could see that the article full text have not been indexed at all and glancing at http://www.gesis.org/sofiswiki/ I could see that larger textual information were stored within a property called Inhalt de but how would one setup SolrStore so it would index also the article full text since not all text information can't/shouldn't be stored in a text property.

Cheers MWJames (talk) 09:44, 8 April 2012 (UTC)Reply

We extent a SMWStore with the class SolrConnectorStore.
It does nothing else than use the default SMWstore and sent each attribute to Solr. To get all the Text to Solr you need it in the SMWstore and I dont know how :-(
the easiest way would be adding a semantic Property for it. In our wiki, we use so many templates that we didnt want to get this stuff indexed too, so we build this extension to avoid the Problem. All attributes of a page get merged by solr to a single field called text, this field is used for the "everywhere search".
If you want to add some data like the not index text to an existing solr page, you have to create a new solrDoc with the PageName as ID and just add the fields you want to add to the id. Solr should merge the data for the ID, if I remember it right. SBachenberg (talk) 10:07, 8 April 2012 (UTC)Reply

Missing Has subobject

edit

While using the solr admin interface to inspect the result XML output, I recognized that subobjects haven't been indexed. The result XML shows an empty entry (see below) for all results even though Has subobject have been automatically assigned with values.

<str name="subobjectname"/>

MWJames (talk) 10:10, 8 April 2012 (UTC)Reply

We have mapped all property Types we known, but not all.
You can map them easily in the SolrTalker.php line 340++ there is a switch case for each Property type, but not all are filled with code.
the code would be something like:
$solritem->addField( $propertyName . '_i', $di->getNumber() );
the "_i" is the dynamicbase which defines the data type in the schema.xml.
have a look at the upper part of the schema.xml, there are all kinds of data types defined. The mapping of the dynamicbase to the data type is in the lower part (where you changed int to float).
you dont need to add sortfield, we dont actually use them at the moment. SBachenberg (talk) 10:55, 8 April 2012 (UTC)Reply

Probe connectivity to the Solr host

edit

When suddenly the Solr host is not available, all article saving goes south. The interface should somehow check if it is able to connect to the Solr host otherwise bail-out.

couldn't connect to host
Backtrace:
#0 D:\xampp\htdocs\...\extensions\SolrStore\SolrTalker.php(211): SolrTalker->solrSend('http://192.168....', '<add><doc><fiel...')

MWJames (talk) 07:54, 12 April 2012 (UTC)Reply

I love you for testing our Extension, we are going to fix this somehow. I could think of retrying to sent it to Solr for 5 times, but after that an error will be thrown.
The Bigger Problem is, that the SMW indexer have to stop until Solr is ready again. I have no idea how to tell him to Stop.
You can allays re-index your wiki by using the "Repair-Button" under Spezial:SMW-Administration, but thats no solution for the Problem. SBachenberg (talk) 10:25, 12 April 2012 (UTC)Reply
Actually for the case above, Solr was not available because the server was restarted. Not sure about the inner working of Solr but certainly their must be method to check if Solr is ready to receive index values and in case it is not return true for the hook and marked the document as non-indexed.
Normally for any indexing services, you would have to have a status table on which one can track the current status of those documents, while I'm sure you don't want to introduce any special handling nor create a additional status table you could instead trace the status by creating a meta-subobject (with a special property) which is created and annotated to an entity (page) in case the status returns with anything other than successful. So either one can run a #ask query to find those subobjects or a special status page can pick those, display and allow for a mass re-index because running Special:SWMAdmin is not alwasy the best option (in our case we have around 1.1M triples which makes every Special:SWMAdmin run very costly). MWJames (talk) 06:34, 13 April 2012 (UTC)Reply
I'll have to think about it, I'll find a nice solution
we have the same problem with re-indexing, it takes us 1-2 days for a Full rebuild. This is why we restart Solr only if we have changed our schema, because after the most schema changes you have to re-index to have all property's indexed the right way.
A Tip beside: Create your own solr schema for your wiki, for better query results. You can add stemmers, tokenizer and many more for your Data types or copyfields, where you can merge two fields into one. The most things are only interesting if you use the field based search. SBachenberg (talk) 07:06, 13 April 2012 (UTC)Reply

a few problems

edit

great extension! hope to see it 100% soon. Running the 'trunk' version I encountered these:

  • fieldset name won't accept spaces
  • search form and results are messed up in Vector skin (fixed with /table from below)
  • Prompted to 'Create the page [fieldset name] on this wiki!'

3. trying to run refreshdata from semantic mediawiki/maintenance:

Warning: DOMDocument::loadHTML(): Unexpected end tag : p in Entity, line: 8 in /home/vid/webs/atip/docs/mediawiki-1.18.1/extensions/SolrStore/SolrTalker.php on line

258

Catchable fatal error: Argument 1 passed to DOMDocument::saveXML() must be an instance of DOMNode, null given, called in /home/vid/webs/atip/docs/mediawiki-1.18.1/extensions/SolrStore/SolrTalker.php on line 209 and defined in /home/vid/webs/atip/docs/mediawiki-1.18.1/extensions/SolrStore/SolrTalker.php on line 261 David Mason (talk) 18:54, 16 April 2012 (UTC)Reply

Hi David,
Thanks for reporting your Problems.
But I'm a bit Confused about some of your Problems.
====fieldset name won't accept spaces====
This should take spaces have a look http://sofis.gesis.org/sofiswiki/Spezial:SolrSearch/Projekte
This is our Fieldset definition from our wiki, maybe there is an error in your definition ?
$wgSolrFields = array(
    new SolrSearchFieldSet('Projekte', 'Titel; Personen; id', 'Titel; Personen und Authoren; SOFIS-Nr. (Erfassungs-Nr.)', ' AND category:Projekte', 'AND'),
    new SolrSearchFieldSet('Institutionen', 'name; Inst-ID;ort', 'Name; Institutions-Nr.;Ort', ' AND category:Institution', 'AND')
);
====search form and results are messed up in Vector skin (fixed with /table from below)====
this should be fixed now in the SVN, sometimes my friend User:Schuellersa gets a bit confused by his SolrStore Versions and commits the wrong one (This is the 3. time we fix it now :)
====Prompted to 'Create the page [fieldset name] on this wiki!'====
This is really new to me, could you please post your $wgSolrFields definition?
====trying to run refreshdata from semantic mediawiki/maintenance====
We are working on that, could you try the newest svn version
This seems to be the same error Extension_talk:SolrStore#Unexpected_XML_tag_doc SBachenberg (talk) 07:30, 20 April 2012 (UTC)Reply
Hi David,
the refreshdata error should now be fixed in the newest SVN version, please test it. SBachenberg (talk) 09:49, 20 April 2012 (UTC)Reply

Error with undefined method SolrConnectorStore::getConceptCacheStatus

edit

Sorry hadn't much time to look at it but the following keeps turning up while using concepts but SMW_SQLStore2 defines a method called getConceptCacheStatus somehow this method is not present in the extended SolrConnectorStore SMWStore class.

Fatal error: Call to undefined method SolrConnectorStore::getConceptCacheStatus() in

MWJames (talk) 16:17, 19 April 2012 (UTC)Reply

Thx for reporting the Bug, to fix it you have to add the following code to your SolrConnectorStore.php line 39
        /**
	 * Return status of the concept cache for the given concept as an array
	 * with key 'status' ('empty': not cached, 'full': cached, 'no': not
	 * cachable). If status is not 'no', the array also contains keys 'size'
	 * (query size), 'depth' (query depth), 'features' (query features). If
	 * status is 'full', the array also contains keys 'date' (timestamp of
	 * cache), 'count' (number of results in cache).
	 *
	 * @param $concept Title or SMWWikiPageValue
	 */
	public function getConceptCacheStatus( $concept ) {
        		return self::getBaseStore()->getConceptCacheStatus( $concept );
	}
SBachenberg (talk) 07:10, 20 April 2012 (UTC)Reply
Fixed in SVN now. SBachenberg (talk) 09:50, 20 April 2012 (UTC)Reply

Why Tomcat dependency?

edit

Why does SolrStore state a dependency on Tomcat? That's only one of various Servlet Containers supported by Solr itself, including Glassfish, JBoss, Jetty (default, included into Solr package), Resin, Weblogic and WebSphere. thanks Hypergrove (talk) 18:54, 20 February 2013 (UTC)Reply

Hi Hypergrove,
you are absolutely right, you can use what ever you want.
Cheers, SBachenberg (talk) 21:02, 20 February 2013 (UTC)Reply
I guess Tomcat is part of the setup you use and cater for. [[kgh]] (talk) 23:03, 21 February 2013 (UTC)Reply

Multicore support

edit

What do you think of specifying the name of a solr core that is to be updated or queried? Hypergrove (talk) 18:59, 20 February 2013 (UTC)Reply

What do you actually mean?
Defining one url for updating and another for querying?
Or do you just want to add the solr core which should be ask for both ?
cheers, SBachenberg (talk) 21:04, 20 February 2013 (UTC)Reply
I realize it would be an extension of SMW, but the thought is to accommodate multiple solr cores. For instance {{#core-ask: |core=name}} and {{#core-set:name|prop=val}}.
just a thought! - john Hypergrove (talk) 16:40, 26 February 2013 (UTC)Reply
When we started with the extension, we tried to do ask query's with solr. But we had to much trouble re-implementing the result printer. The SolrStore is currently a better version of the Extension:MWSearch. If you have good knowledge in the smw code, you implement this feature. I will help everybody who is interested in developing new features to the extension, just pm me. SBachenberg (talk) 16:55, 26 February 2013 (UTC)Reply

summary line wrong

edit

hi,

We're now trying SolrStore with MW 1.19/Solr 4.1/SMW 1.8. It all seems to work, but the summary line is wrong:

Relevance: 27.0% - 2 KB (19 words) - 22:28, 16 May 2013

The search word is in the title, so relevance should be higher? The article is more than 19 words (not sure about KB size), and the date is incorrect since the article was last modified on the 15th. Is there a known fix?

thanks! David Mason (talk) 22:30, 16 May 2013 (UTC)Reply

Hi David,
nice to here that it's working with Solr 4.1 we haven't test it yet. The Relevance is a Bit tricky, because Solr generates a Score for each result based on TF-IDF. Normally you can not convert a TF-IDF score cleanly into a percentage. But the default MediaWiki search form wants a Relevance in percent. We have often relevance values far over the 100% so Please do not take it as accurate.
For the last modified date you have to do 2 things:
  1. Look at your solr search result xml and find the actual field name of your Modification date. The Problem here is, its based on the language you are using. In an English wiki it should be "Modification date_dt", in German it's "Zuletzt geändert_dt".
  2. Go into SolrStore/templates/SolrSearchTemplate_Standart.php line 81 and change it from: if ( $docd[ 'name' ] == 'Zuletzt geändert_dt'){ to your language. if it's English: if ( $docd[ 'name' ] == 'Modification date_dt'){
EDIT: I just uploaded a fix for the English language to SourceForge: http://sourceforge.net/projects/smwsolrstore/files/SolrStore_0.8.1.zip/download
If you have any other Problems etc. just ask. SBachenberg (talk) 05:36, 17 May 2013 (UTC)Reply
Heiya Simon,
it would be nice to have the commit for the new version also in Gerrit. Thus all the translation update would move into this version, too.
Cheers [[kgh]] (talk) 07:37, 17 May 2013 (UTC)Reply
Hi,
Thanks for the translation fix, the date is correct now. However, the file size is still incorrect. For example, it shows "212 B (16 words)" for a page that is 752 words, 4534 bytes. As it's different for each entry I presume it's not a translation problem.
How should the "relevance" score be interpreted? Is the sorting correct? I don't want to put something in front of the users that's confusing. David Mason (talk) 17:22, 17 May 2013 (UTC)Reply
Hi,
the Score is a correct tf-idf score, the higher the score the better and the sorting is also correct.
I'll look into this Bytes/Words Problem. I have currently no idea where the problem is, but I'll answer you as soon as possible.
One thing you should know about the extension is, that we currently don't support the search in selected namespaces. You can only search in all namespaces, but you can disable some namespace in your LocalSettings.php with the parameter $wgSolrOmitNS.
The default is:$wgSolrOmitNS = array('102' );
You should hide you advance search options so nobody gets confused. The CSS for that is:
.mw-search-formheader div.search-types, #mw-searchoptions{
display: none;
}
SBachenberg (talk) 17:49, 17 May 2013 (UTC)Reply
Thanks very much for your diligence! Let me know if I can help. David Mason (talk) 21:00, 17 May 2013 (UTC)Reply
Sorry that it took so long, but I was a bit busy the last Days.
Could you please change the Code in the File /SolrStore/Templates/SolrSearchTemplate_Standart.php line 33 to this:
// get Size, Namespace, Wordcound, Date from XML:		
foreach ( $xml->arr as $doc ) {
	switch ( $doc[ 'name' ] ) {
		case 'text':
			$textsnip = '';
			$textsnipvar = 0;
			foreach ( $doc->str as $inner ) {
				$textsnipvar++;
				if ( $textsnipvar >= 4 && $textsnipvar <= $snipmax ) {
					$textsnip .= ' ' . $inner;
				}
				$textsnip = substr( $textsnip, 0, $textlenght );
			}
			$this->mDate = $doc->date;
			break;
		case 'wikitext':
			$this->mSize = strlen( $doc->str );
			$this->mWordCount = count( $doc->str );
			$textsnipy = "";
			$textsnipy = $doc->str;
			$textsnipy = str_replace( '{', '', $textsnipy );
			$textsnipy = str_replace( '}', '', $textsnipy );
			$textsnipy = str_replace( '|', '', $textsnipy );
			$textsnipy = str_replace( '=', ' ', $textsnipy );
			$textsnipy = substr( $textsnipy, 0, $textlenght );
			break;
	}
}
I will upload a fix to SourceForge later. SBachenberg (talk) 07:13, 27 May 2013 (UTC)Reply
Hi again,
I tried this fix, it changes the output but it's still not correct, unless there is something unusual about how it handles text in SMW templates, where most of our text is located. I also had to comment out code references to $nsText. David Mason (talk) 12:19, 28 May 2013 (UTC)Reply
Hi,I thought that solves the problem.
Let me tell you a bit about how to handle the wikitext. We store the wikitext in solr field "wikitext" and each SMW attribut in its own field. We also have a field called "text", in which we save all fields combined. Before the patch we used "text" for the calculation now we changed it to wikitext, which should be the right field for that purpose.
All of this fields can be customized through solr it self and thats where the Problem must be. Could you please have a look in your Solr schema.xml. In line 953 should be something like that:
<field name="wikitext" type="text_general" indexed="true" stored="true" multiValued="true"/>
This defines "wikitext" with the Solr FieldType "text_general", which I thought would be the right, but I never thought about to count words and Bytes. Could you please change it to "string", because "text_general" uses analyzers, tokenizer and a handful of filter. All these things manipulate the original text, which leads to the miss calculation.
The only big Problem is, that you have to restart you solr after altering the schema.xml and also have to re-index you wiki, so that the new field definition can show it results.
Please tell me if it works, because re-indexing our SofisWiki takes up to 3 Days and you will probably be faster :-) SBachenberg (talk) 13:49, 28 May 2013 (UTC)Reply
Hi again,
I've changed that line and restarted solr, then I ran SMW_refreshData (?) . It's not clear how to refresh the data so I ran SemanticMediaWiki/maintenance/SMW_refreshData.php and also maintenance/runJobs.php. During the former I saw lots of this:
PHP Notice: Array to string conversion in /var/www/mw/extensions/SemanticMediaWiki/includes/storage/SQLStore/SMW_SQLStore3_Writers.php on line 383
Unfortunately now the result looks the same as it did previously, and I see results like this "2 KB (1 word)" — that's some word!
If it helps I could set up an isolated instance for you to connect to directly? David Mason (talk) 16:57, 28 May 2013 (UTC)Reply
Hi David,
this sounds nice, but I think it would be enough if you could sent me an XML result from your Solr. Then i can find out, why the stored data is not counted correctly.
The way you re-indexed was absolutely right, but the error you get is not from the SolrStore. Thats an known SMW error: https://bugzilla.wikimedia.org/show_bug.cgi?id=42321 SBachenberg (talk) 07:54, 29 May 2013 (UTC)Reply
HI David,
I may have found the Error. Could you please change
$this->mWordCount = count( $doc->str );
to
$this->mWordCount = str_word_count( $doc->str );
Sorry, that fixing this takes so long. Because I haven't written this "Template" part. Thats all the work of my workmate Sascha Schüller, but he has currently no ambition to fix that. SBachenberg (talk) 08:34, 29 May 2013 (UTC)Reply
Hi,
I think that is better, it is higher than the "wc" word count but that may be what it considers "words." I will run this by the users with some caveats.
Thanks again!
I'd like to talk about ways to extend this project, for example to support 'classes' without a php-coded template.
And it could also support uploaded documents (Word, PDF) since it's based on Solr.
Are these being considered? David Mason (talk) 16:07, 29 May 2013 (UTC)Reply
Hi David,
your ideas sound really good, but I'm not a good "Extension Developer", because I have almost no idea how the Mediawiki works internally.
But maybe you have the knowledge that lacks me. I also would like to change the way, how the Fieldbased search templates get defined. Writing them into the LocalSettings.php is so uncool. It would be much nicer if I could define them with Semantic Forms.
So if you want to extend this project, you can do it on your own or we can make it together. Feel free to ask me everything about this extension. SBachenberg (talk) 08:12, 31 May 2013 (UTC)Reply
yes, I'm absolutely interested in working on this, though I'm swamped for the next week. Can we meet next week online to talk to about it? David Mason (talk) 19:48, 5 June 2013 (UTC)Reply

Updates?

edit

Will there be a stable version for newer Solr-Server (like 5.5. or 6.0), SMW 2.3. and Mediawiki 1.25 oder 1.26? Because I have lot's of problems to configure this extension on my server.

Thanks! M art in (talk) 15:20, 12 April 2016 (UTC)Reply

No Specialpage

edit

After installation there is a new link on Special:Specialpage to "SolrSeach" but when I follow the link, no Specialpage is found. Thanks for help! M art in (talk) 13:19, 14 April 2016 (UTC)Reply

Using 'Range' input type for a field

edit

I have implemented the solr search on my wiki using the SolrStore extension. I used the filter based search feature of the Solr. One of the filter input in the search is the year input. For now, the year input is a dropdown (user can select a particular year and search for results in that year). I want to change it to a range ( where user can select a group of years, say from a to b, to search for results in those years.

How camnI achieve this?

Thanks in advance. 182.73.117.198 (talk) 15:00, 4 May 2016 (UTC)Reply

Please also link to other places when cross-posting. I believe that the post to the mailing-list was yours. [[kgh]] (talk) 15:16, 4 May 2016 (UTC)Reply
Return to "SolrStore" page.