Talk:Improving URL citations on Wikimedia
Using Zotero
editSeveral years ago I tried to do the same thing using Zotero. I'm sure you are familiar with it, but Zotero is a browser plugin that has a huge collection of sub-plugins for transforming a web page into a citation. It has some very generic plugins (DOI, ISBN, RDFa) and specialised plugins for all major (and many minor) websites.
I had to abandon the project eventually (mostly due to lack of time) but my impression was that this is direction that should be pursued for URL->citation transformation. I got it working very easily to the point where the generic plugins worked - I put the code on GitHub just to give a feeling of the (lack of) complexity - all the zotero_* stuff is copied from Zotero source with minor changes to get it working, the rest is the actual code of the project.
At the time Zotero was Firefox-only, so I used Jaxer to run it in a headless Firefox instance on a server. That didn't work well (but as said above, it still worked well enough to handle all DOI and RDFa references, which is the wast majority of scientific citations), since Zotero used some Firefox internals, and Jaxer did not properly implement them. Also, Jaxer proved unsuccessful as a commercial opensource project and was abandoned by its developers (Aptana) soon after that.
These days things should be much easier - Zotero has its own server-side version (using XULRunner), and it is now cross-browser, so if that doesn't work well, it could be probably used via PhantomJS or some othe node.js environment.
IMO there are huge advantages to using it over developing a citation parser solution of our own. Zotero is mature, widely used, it has several hundred different citation parser plugins (list), parsers written for additional websites could be shared by the academic and wikimedian community, it could probably be used to upload your citation library (Zotero has parsers for all widely used formats) and use that to source articles (might be a big convenience for experts who want to write Wikipedia articles)... the disadvantage is that scaling might be challanging, since it uses a headless browser (although these days some node.js-based solutions for that are pretty cheap). Even so, I think it would be worth trying. --Tgr (WMF) (talk) 16:43, 6 May 2014 (UTC)
Eh, I really should learn to read first and write after that... I see from Extension:CiteURLEngine that this was the plan anyway. Glad to see that - MediaWiki historically had a strong NIH syndrome which IMO can be pretty damaging.
Anyway, great to see that someone is working on this! This has been a hobby project of mine (and by "hobby" I mean "never got around to actually work on it") - if I can help somehow, feel free to ping me. --Tgr (WMF) (talk) 16:51, 6 May 2014 (UTC)