Extension talk:External Data/Archive 2010

Conflict with NTLM authentication

Hi Yaron

I discovered something you may want to add to 'Common problems'.

#get_external_data fails if you are using a remote wiki protected by Windows NTLM authentication scheme.

I had to exclude '/wiki/Special:GetData' from authentication in order to get it to work correctly.

This is probably due to the way the NTLM handshake works (it first sends two HTTP requests with status 401 before getting to the actual page).

- Laurent Alquier

Sorry, is this for any #get_external_data calls, or just ones to Special:GetData? Yaron Koren 04:03, 11 January 2010 (UTC)


Loading data from a JSON api

Hello Yaron!

I'm trying to load data from a json response but don't know how to manage an array. Here is an example of response:

{"responseData": {"results":[{"url":"http://www.koiora.net/wp-content/uploads/2009/04/perro.gif","visibleUrl":"www.koiora.net"},{"url":"http://zaragozaciudad.net/tejeros/upload/20080414122439-perro25.jpg","visibleUrl":"zaragozaciudad.net"}],"estimatedResultCount":"890000"}}, "responseDetails": null, "responseStatus": 200}

1) How can I get the second url field?

2) With the following code I get one url field:

{{#get_external_data:http://ajax.googleapis.com/ajax/services/search/images?v=1.0&q=dog|json|imagen=url}} {{#external_value:imagen}}

but it ends showing this way:

<img src="http://www.koiora.net/wp-content/uploads/2009/04/perro.gif" alt="perro.gif">

... When are the html tags added???

Thanks!!! - Jaime

Hi - for 1), you should use #for_external_table instead of #external_value. For 2), do you mean that that HTML is literally what you see on the screen? If so, that's bad... Yaron Koren 00:57, 22 January 2010 (UTC)
1) perfect!!! 2) I've just realized that the html tag is there because mediawiki wraps the image url to show it; I'll have to avoid this behavior in order to load the data into a semantic form field.... Thank you very much! :) - Jaime

Not working? CSV #for_external_table

Hello - I've tried the following:

{{#get_external_data:<url to local file ending in .csv>|CSV|name='1'|etc}}

and then

{| class="wikitable"

! Name {{#for_external_table:<nowiki/>

{{!}}-

{{!}} {{{name}}} }}

|}

We've created the Template:! and still we see no data - just an empty table. We've tried no local variable and name='1'; name=1; name="1"; etc. Nothing works. Are we doing something silly? Installation is confirmed and our MW is 1.13.2 --Robinson Weijman 14:37, 28 January 2010 (UTC)

Hi - it should be "name=1", without quotation marks - I should clarify that in the documentation. I would try simplifying the call - get rid of all the formatting, and see if that works. If it doesn't, what happens if you just call "{{#external_value:name}}"? If that doesn't work either, my guess is that the URL somehow isn't getting accessed. Yaron Koren 15:50, 28 January 2010 (UTC)
Thanks for the quick reply. I removed the quotes - did not help. I tried just {{#external_value:name}} - again nothing. The URL did work with a previous implementation when we used iFrame - so it should be accessible via the wiki. I just tried $wgHTTPTimeout = 20; - did not help either. What formatting should I get rid of? --Robinson Weijman 16:13, 28 January 2010 (UTC)
Okay, my strong guess it that that URL can't be accessed. It's the PHP code, i.e. the server on which the wiki sits, that has to access that URL - and just because you can see the URL doesn't mean that the server can. My guess is that that's somehow the issue. Yaron Koren 21:02, 28 January 2010 (UTC)
Thanks again. It's a UNIX server - how can I test your guess? Strange that it worked fine via the iFrame extension. --Robinson Weijman 08:16, 29 January 2010 (UTC)

With the iFrame extension, the server isn't accessing the URL. You could try running a PHP script on the server like the ones on this page... Yaron Koren 16:55, 29 January 2010 (UTC)

Thanks again! I'll check it out. --Robinson Weijman 08:41, 1 February 2010 (UTC)

Retrieving Data via ssl - possible?

Hi,

I am trying to retrieve data from an external source using a non-trusted certificate via ssl (https://example.com/xyz.csv), thus generating no output. Are there any known issues or hints you can give me?

Thanks a lot for your efforts!

--82.113.106.207 20:27, 23 February 2010 (UTC)

Sorry, I don't know enough about SSL to answer that question, though you're probably right that that's what's causing the problem. I would try creating a PHP script, on the server running the wiki, that just retrieves the text of that URL. If you can get that working, send me the code of that script and I'll see if I can fit it into the extension somehow. Yaron Koren 00:32, 24 February 2010 (UTC)
Hi, solution was quite easy: I set
$edgAllowSSL = false;
in the ExternalData.php.
Yet another problem arose afterwards: the script needs to log into the wiki, as I have a restrictive rights management. I am trying to realise that using curl hardcoded into your extension, as soon as I got it working, I will post the solution here. --80.153.59.17 09:54, 24 February 2010 (UTC)
Ok, to access a mediawiki with SSL and restrictetd readaccess only for registered users, I successfully did the following:
  • Created a user 'ScriptDummy' in the wiki
  • Created a file 'mwcookies.txt' and gave the apache (using Linux) rights to read and write the file
  • Replaced in ED-Utils.php (ED Version 0.9.2)
return Http::get( $url, 'default', array(CURLOPT_SSL_VERIFYPEER => false) );
with
Http::post( "https://{{WIKIURL}}/index.php?title=Special:Userlogin&action=submitlogin&type=login", 'default', array(CURLOPT_SSL_VERIFYPEER => false, CURLOPT_POST => 1, CURLOPT_POSTFIELDS => "wpName={{SkriptDummyUserName}}&wpPassword={{ScriptDummyUserPassword}}&wpLoginattempt=Login", CURLOPT_COOKIEJAR => "/{{filesystem path to}}/mwcookie.txt", CURLOPT_COOKIEFILE => "/{{filesystem path to}}/mwcookie.txt", CURLOPT_FOLLOWLOCATION => true, CURLOPT_RETURNTRANSFER => 1));
return Http::get( $url, 'default', array(CURLOPT_SSL_VERIFYPEER => false, CURLOPT_COOKIEFILE => "/{{filesystem path to}}/mwcookie.txt") );
Working fine - thanks Yaron! --80.153.59.17 11:05, 24 February 2010 (UTC)
Wow, okay. That's great that you got it working. I'll try to come up with some generic solution that allows for approaches like yours. Yaron Koren 17:56, 24 February 2010 (UTC)

{{#for_external_table:}}

Could this also be used to return multiple likes from an #ask query into different rows of a table? I can for the life of me figure out how to template a #ask query through a template into a real tabe. Only thing availabe seems to be the default. Having more options for a ResultFormat for custom tables would be nice.Dmoorevtedu 01:08, 18 March 2010 (UTC)

Sure, there's no reason why a wiki couldn't retrieve its own data. Yaron Koren 13:14, 18 March 2010 (UTC)

error "" is not a valid magic thingie for "get_external_data", "get_ldap_data", ...

Running the latest SVN version (r64385) on MW 1.16.0beta1, first I noticed that the assignment $edgIP = $IP . makes the assumption that ExternalData is installed as a subdirectory of $IP (=normally mw/extensions/). This assumption is not made in other extensions. Also, the error_log is full of "" is not a valid magic thingie for "get_external_data" etc. errors. Probably something with the parser or internationalization messages? I cannot track it down, any help very much appreciated. Thanks, Wolfgang Spraul 88.198.75.226 12:33, 30 March 2010 (UTC)

Update: I think the 'magic thingie' error messages are a consequence of the $edgIP setting. I had a symlink from extensions/ExternalData but probably symlinks aren't followed elsewhere. I think $edgIP should not be based off of $IP, it should be determined from the path of the current file (ExternalData.php). Hard-wiring $edgIP to my ExternalData folder fixes this for me for now... Wolfgang Spraul 88.198.75.226 13:14, 30 March 2010 (UTC)

Thanks for this bug report - this is now fixed (I hope) in SVN. Yaron Koren 15:57, 30 March 2010 (UTC)
Great! I can confirm that svn r64415 fixes the hardcoded path issue and the magic thingie errors that may result from it. Wolfgang Spraul 88.198.75.226 01:37, 31 March 2010 (UTC)
Excellent. Yaron Koren 16:43, 31 March 2010 (UTC)

str_getcsv() can only parse one line, not an entire file (multi-line)

This might be another bug (r64415 now). In ED_Utils.php, if str_getcsv() is present (>=PHP 5.3.0), it's called to parse the entire multi-line csv. However, according to the documentation and in reality, it can only parse one line. See http://php.net/manual/en/function.str-getcsv.php . For now I have just commented out that function and the code goes back through the old write-to-file loop. str_getcsv() can be used but some sort of loop needs to go around it I think. There are many examples at the bottom of the PHP documentation page. Wolfgang Spraul 88.198.75.226 09:25, 1 April 2010 (UTC)

Thanks for letting me know about that - this has now been fixed, in the new version (1.0). Yaron Koren 20:56, 3 May 2010 (UTC)

PHP warning when Caching is turned on.

Hi Yaron

I am using the latest version (1.0) and I am getting these errors from PHP when trying to retrieve cached external data queries :

Notice: Undefined variable: res in $IP\extensions\ExternalData\ED_Utils.php on line 334
Notice: Trying to get property of non-object in $IP\extensions\ExternalData\ED_Utils.php on line 335

I am looking at the code but so far, I am not sure what $res refers to.

- Laurent Alquier

That is odd - apparently that bug has been there for a while. What happens if you just comment out line 334? Yaron Koren 19:46, 5 May 2010 (UTC)
The cache seems to be working without warning when that line is commented out :) I guess not a lot of people are using cached queries (they should).
Another point worth mentioning is that the parameter for cached table name is the name of the table *before* adding any database prefix defined in LocalSettings.php - Laurent
Alright, cool. Yes, that's a good point as well. Yaron Koren 22:12, 5 May 2010 (UTC)

Nested XML Content

Really like the extension - so much potential. I looked every where I could think of, but have not found a more detailed discussion of using external XML data. Perhaps somebody has figured out how to do this?

<?xml version="1.0" encoding="UTF-8"?>
<fruits>
 <box color="brown">
       <fruit type="Apple"><color>red</color></fruit>
       <fruit type="Plum"><color>purple</color></fruit>
 </box>
 <box color="green">
       <fruit type="Kiwi"><color>brown</color></fruit>
 </box>
</fruits> 

Any way to specify local variable mappings for box color vs. fruit color? --12.167.75.11 21:16, 3 June 2010 (UTC)

Hi - unfortunately, no - that's one of the big limitations of External Data at the moment. The plan is to eventually allow for passing in an Xpath for each parameter. That might have to be a different format, like 'xml with xpath' or something. Yaron Koren 02:06, 4 June 2010 (UTC)

Unsuccessful install with no diagnostic hints

We have a MW 1.11.0 installation to which we have just added External Data 1.0. The following is observed:

  • No new extension functions, no new parser extension tags, and no new parser function hooks have appeared.
  • The hook "LanguageGetMagic" is subscribed by "edgLanguageGetMagic" and "ParserFirstCallInit" by "edgRegisterParser".
  • The Special Page "Get data" is blank. (Really, totally, completely blank. No source is returned by the server although properly formatted headers are exchanged and they indicate that a successful request is being served.)
  • Uses of {{#get_ldap_data: ... }} and {{#external_value: ... }} are unparsed and presented verbatim in page output.

We have

require_once( "$IP/extensions/ExternalData/ExternalData.php" );

in LocalSettings.php (and an LDAP server stanza).

It's not clear what's not working. The rest of the wiki seems to be working normally. -- 205.162.232.254 21:58, 17 June 2010 (UTC)

'GetData' is blank by default (maybe it shouldn't be), but that's not a problem. I'm almost sure that the issue is your MediaWiki version - evidently External Data doesn't support 1.11 (it's always hard to know what exactly the cutoff is). Sorry about that; I'll change the documenation. Yaron Koren 22:28, 17 June 2010 (UTC)

Error codes?

Hi, I've been trying to get the extension to access a mysql database on the same server as my wiki, but so far all the extension has given me is a bunch of error codes like I noticed someone else had earlier. After calling external data, I get these all over the page.

?UNIQ7ab4befc473fac2c-pre-00000014-QINU? 

Any idea whats going on? My wikitext looks like this.

{{#get_db_data:
|server=db_server
|from=qrydispreasons
|where=DispId='1'
|data=id=DispId,code=DispIdCode,reason=DispReason
}}

{{#external_value:code}}
{{#external_value:reason}}

I might just be calling the extension improperly, but I'm a bit new to sql queries. Any help is appreciated. --198.151.12.10 15:33, 6 July 2010 (UTC)

Sorry for the long delay - if you're still reading this, what version of MediaWiki are you using? Yaron Koren 15:31, 27 August 2010 (UTC)

Documentation flaw: Updating semantics by pressing purge doesn't work in SMW

The documentation on Extension:External Data says that pressing the refresh button (= action=purge) will update the attribute data of the article. In fact this is not the case in SMW. Purging a page will only update the article text but not semantic data. See Bug 22751 for details. I wrote the extension Extension:SemanticUpdateOnPurge to enable updates of semantic data on purging a page. If you don't have something like that in Extension:External Data which is doing the semantic update on purge, you probably should update the documentation on this issue. --Danwe 05:39, 8 July 2010 (UTC)

The documentation just says that purging the page will get it to "display the new data" - not update the semantic storage of the data. But I suppose it wouldn't hurt to clarify that, as you note, the latter won't happen. In any case, I think your extension has a lot of promise, especially for External Data, and once you've gotten the bugs out (assuming my hunch is correct that it has problems right now), it would make sense to either recommend it for use with ED, or maybe even integrate some of the code directly into ED. Yaron Koren 14:33, 8 July 2010 (UTC)
It looks like the bug we are talking about is a SMW bug. I have the issues without my extension after updating the datas and recreating the SMW-DB tables as well. So I think you can recommend the extension or even integrate it into ED or SMW as optional feature. By the way, I created a bug report for the bug above see bug 24348. --Danwe 16:41, 12 July 2010 (UTC)

Any plans to support oracle

I've notice that oracle db type is not supported yet, any plans on this direction?

Oh, I thought it was supported - I've personally never tested it on Oracle, so I wouldn't know. Have you tried it? And if so, did you get any interesting error message? Yaron Koren 14:35, 8 July 2010 (UTC)
First I had to change ExternalData/ED_Utils.php to accept oracle db type, as following
                if ($db_type == "mysql") {
                        $db = new Database($db_server, $db_username, $db_password, $db_name);
                } elseif ($db_type == "postgres") {
                        $db = new DatabasePostgres($db_server, $db_username, $db_password, $db_name);
                } elseif ($db_type == "mssql") {
                        $db = new DatabaseMssql($db_server, $db_username, $db_password, $db_name);
+                } elseif ($db_type == "oracle") {
+                        $db = new DatabaseOracle($db_server, $db_username, $db_password, $db_name);
                } else {
                (...)
                }
Then I tried a simple #get_db_data and got Fatal error: Call to undefined method ResultWrapper::fetchAssoc() on includes/db/DatabaseOracle.php. It seems problem is not with the extension, but with the oracle support in MediaWiki. I would appreciate to help making this extension work, feel freee to ask any other tests.
These tests were made on MW 1.15.1, I'll try to upgrade to 1.16, which has some improvements on oracle support, and post results here.
Hi, Yaron! It worked with MW 1.17alpha, I'll update the extension page to report that. But please note that it is still needed to apply the changes in ED_Utils.php I posted above to '#get_db_data' accept oracle db_type. Matheus Garcia 11:28, 9 July 2010 (UTC)
Hi Matheus - that's good news; it means the problem probably came from MediaWiki, not ED. Thanks for the research, and thanks also for the new code - it'll go into the next version. Yaron Koren 14:32, 9 July 2010 (UTC)

how to define credentials to access a protected wiki

hi folks, I have a protected wiki, i.e. you have to be logged in to access articles. Now, how can I extract relevant data (for example csv via semantic search) with this extension? I haven't found any parameters to provide a user authentification. Thx for help. --Rolze 16:52, 16 July 2010 (UTC)

There's no way to do that; which I would think is good, because otherwise everyone on the 2nd wiki could see private data... Yaron Koren 17:18, 16 July 2010 (UTC)

Variable number of header lines

Please can we specify the number of header lines? I frequently deal with files that have 3 or 4 or more very complex header lines, and it's not always possible to work around them

Thanks!

Having a parameter like "start line=" sounds like a reasonable request, so that the CSV parser (I assume you're talking about CSV) can skip over more than one line... though actually understanding the headers, if it's one more than one line, is probably too complex for External Data to be able to do any time soon. Yaron Koren 16:15, 26 July 2010 (UTC)
On the off chance that you see this (or for anyone else who may be interested in such a feature) - I've thought more about it, and I fear that this is too specific a solution; especially since additional header lines aren't allowed in standard CSV. I've read that some CSV implementations allow for a character at the beginning of lines to indicate that they're to be ignored - ';', '#', etc. Would that work, or would it be better to just skip over some lines, regardless of their content? Yaron Koren 15:19, 13 October 2010 (UTC)

Trouble using qualified column names

This extension is really great! But I'm facing a problem now in my first usage with join in Oracle. The following sample code does not work:

{{#get_db_data:
|server=serverID
|from=vservidor_setor join vservidor on vservidor.anum_registro_servidor = vservidor_setor.anum_registro_servidor
|where=acod_setor=108 and acod_subsetor=1
|data=num_registro_servidor=vservidor.anum_registro_servidor
}}

{| class="wikitable"
! Codigo {{#for_external_table:<nowiki/>
{{!}}-
{{!}} {{{anum_registro_servidor}}} }}
|}

The problem reported is Notice: Undefined index: vservidor.anum_registro_servidor on ED_Utils.php, at about line 165, which is:

$values[$column][] = $row[$column];

a "print_r" just before this line shows that $columns array has the only index [vservidor.num_registro_servidor], as expected, while $row array indexes are [num_registro_servidor] and [0]. As $row does not have [vservidor.num_registro_servidor] index, I can´t get the values.

I think $row index should be [vservidor.num_registro_servidor] instead of just [num_registro_servidor], maybe it's another bug on oracle support in MW (I didn't tested on another DBMS), but I would appreciate if there is a way to solve that in this extension, maybe allowing us to create aliases to the returned columns to avoid that.

Special thanks, Matheus Garcia 16:39, 16 August 2010 (UTC)

Just to complement: that's not an Oracle issue, I got the same error in mysql.
So, the problem is: when I do a join, how can I get the value of a column which is in both tables?
If I don´t qualify I get a message saying the field is ambiguous, but if I qualify it with the table name then I get the error posted above.
Matheus Garcia 11:26, 17 August 2010 (UTC)
Hi - I finally looked into this now; that's a real bug, one that I had heard about before but had never really looked into. I just checked in a fix for this bug, in SVN; if you can't get the code through SVN, there will probably also be a new version in the next few days. Yaron Koren 01:58, 26 August 2010 (UTC)

Issue with parameters that contain '_'

I've created a template with parameters to query database, but I noticed I got no results when the specified parameters have the '_' character, just like the URL bellow:

MEDIAWIKIHOME/index.php5/Especial:Call/Tabela,nome=FOO_BAR

My template is simple:

{{#get_db_data:
|server=serverID
|from=ttabela
|where=anom_tabela like '{{{nome|}}}'
|data=num_interno_tabela=anum_interno_tabela
}}
{{#external_value:num_interno_tabela}}

I think the '_' is been encoded when passed to the SQL, so that it doesn´t match in the where clause. Do any of you have an idea of how can I solve that? Matheus Garcia 14:58, 23 August 2010 (UTC)

Hi - that doesn't sound like a problem with External Data - it sounds like a problem with (I guess) the "Call" extension, where an underscore in the URL doesn't become an underscore on the page. If that's true, I don't know how to solve it. Yaron Koren 02:06, 26 August 2010 (UTC)

Change cell color depending on its content.

Hi guys.

If i try to get data from my database everything looks fine.
My question is: Is it possible to change the color of a cell depending on its content?

An Example:

Servername connection
server1 disconnected
server2 connected

Best regards Jan

The extention is very very fine. We are useing it and it works good (fetching data from a mysql database) but our problem now is that the data wich comes back is not findable in wiki internal search. This is clear for me because in wikitext you se the externel-data and so on. and therefore there is no value on this variables in the wikibase bacause you can only see the values when the page it rendered. But which solution do you have for searching and finding any data in the wiki internal search?

Best regards Hirad

Hi, that's great to hear! I think the only real solution to that problem is to use a search engine other than the wiki's own in order to search the wiki. If this is an internal wiki, there are a large number of search engines you can install that will "crawl" through the wiki's pages and then let you search on their actual display. Yaron Koren 13:28, 29 September 2010 (UTC)
Hi many thanks for answer, could you please tell me some of these search engines? I thried htdigextention (it finds more but is not optimal) then i tried SphinxSearch which is searching only the database and not crawling through html.
See here, and also see the subcategory "Free web crawlers". Yaron Koren 12:41, 30 September 2010 (UTC)

External data flawed after SMW refreshData

Hi, for the second time I observed that after rebuilding the SMW database a number of user pages that fetch name/mail/phone data from LDAP are showing flawed data. There are groups of user pages that show the same data, even though they have different LDAP usernames (the username is the BASEPAGENAME, e.g. User:Ha78471 -> Ha78471) and I can only correct it by clicking "edit" and then save. The database tables show that the wrong values for name/mail/.. are stored in e.g. smw_atts2 together with the s_id of the flawed page. This happened with version 0.9 as well as 1.1. - LosWochos 11:57, 4 October 2010 (UTC)

I remember this problem happening, but I thought it was fixed. It's likely something to do with a global variable not being reset somewhere, as it only occurs because ED is called multiple times by refreshData.php. Unfortunately I don't use ED any more so I'm not able to investigate further. - Borofkin 04:34, 12 October 2010 (UTC)
LosWochos - out of curiosity, what version of SMW are you using? Yaron Koren 04:16, 13 October 2010 (UTC)

Comparison with Extension:DataTransclusion

I've just noticed the link to Extension:DataTransclusion. Does anyone have any thoughts on this extension? It seems to perform a similar function to ED. - Borofkin 00:51, 25 October 2010 (UTC)

First of all Extension:DataTransclusion is considered "beta" while this one is "stable".

External data into Chart ?

I have External Data 1.1 - works great to display table of values. I did install Gchart4mw and GoogleCharts extensions for Line Charts. I can make them work with the "hard codded" data points. All attempts to feed results of External Data into any of the chart extensions unsuccessful. If it is possible, please, provide example of displaying result of External Data in the graphical form.

The only way I know to display External Data results in charts is to first store the data using Semantic MediaWiki, and then use one of the SMW query formats (like these two) to display the data. That may not be the direction you want to go in, though. It might be worth talking to the developers of those chart extensions to see if they can add support for External Data. Yaron Koren 18:08, 21 December 2010 (UTC)

#get_web_data XML not working?

I'm having difficulty getting data from XML files. If I've understood correctly, this:

{{#get_web_data:
|url=http://www.w3schools.com/xml/note.xml
|format=XML
|data=test=from
}}

This is a test: {{#external_value:test}}

should return "Jani". Instead it's blank, and previewing the edit shows XML error: 2 at line Array. Am I doing something wrong? I'd really like to use this to display XML data on wiki pages! Adamcox82 19:55, 22 December 2010 (UTC)

You found a bug - the XML parsing was failing on comments, and that piece of XML contained one ("Edited by XMLSpy"). I just a checked in a fix into SVN, so if you're using SVN, if you just update your code it should work. Thanks for letting me know about it. Yaron Koren 14:30, 23 December 2010 (UTC)
Thanks for getting back to me so quickly with the fix! I was getting a similar error when I tried using it for the data I was actually interested in (from MediaWiki's own API). I'd tried passing it a URL like http://www.mediawiki.org/w/api.php?action=query&meta=siteinfo&siprop=statistics, but hadn't realised that the format wasn't actually XML (even though it says so at the top, d'oh!) I was able to get it working by adding &format=xml to the API query string. Maybe you could add a brief section explaining how to use the extension to retrieve data via the MediaWiki API? I think that's quite a valuable and widely applicable application of the extension. Thanks again for this useful tool! Adamcox82 12:14, 31 December 2010 (UTC)
Return to "External Data/Archive 2010" page.