Talk:Alternative parsers/Archive 1
Archives
| |
---|---|
| |
Any user names refer to users of that site, who are not necessarily users of MediaWiki.org (even if they share the same username).
Standalone library for converting Mediawiki-style WikiText into (eventually) xhtml
editI'm currently building a web application which allows users to publish structured text. I'd like Mediawiki-style WikiText (exactly as used on wikipedia.org) to be one of the accepted input formats.
Also, I love the way WikiText is rendered to html here on Wikipedia, so I'd like to emulate that as much as possible. I've looked over the Alternative Parsers page and my impression is that WikiText parsing and html rendering is still performed by some PHP modules in the MediaWiki software.
Are there any ongoing efforts with regards to separating these out into a general-purpose C/C++ library? I am aware of the problems associated with having binary extensions, but perhaps MediaWiki could fall back to the (hopefully 'deprecated') php-based parser/renderer if it is not able to load the binary parser/renderer.
So far the most interesting code i've found is flexbisonparser and wiki2xml in the mediawiki svn trunk. But none of them seem to do exactly what I want.
I understand that there are ongoing efforts to standardise Mediawiki-style WikiText. A standard library like I've described briefly here would go a long way in ensuring that such a standard spreads.
Hope to hear from you if you have any solutions.. and sorry for posting on this page if it is wrong. :-) 195.159.10.101 14:14, 19 July 2006 (UTC)
- Converting the text to C++ likely isn't going to happen any time soon. We have certain performance-critical portions of MediaWiki (such as diffing) ported to these languages, but the parsing and rendering of wikitext is an extremely complicated process with lots of legacy cruft and a hideous Parser.php. — Ambush Commander(Talk) 23:42, 19 July 2006 (UTC)
Wiki to HTML (and back) APIs
editI came across this page looking for a Java API/library for converting wikitext to and from HTML or also XML. The ones listed in the table are not what I was looking for as I want to use an API for wikitext rendering, whilst those listed are Java Wiki engines. Obviously in some of the listed software they must be able to do that, but the code isn't exposed through an API. Does anybody know of anything that can do this? If not, I'll just make one myself. --82.15.247.14 14:11, 3 September 2006 (UTC)
- I'm just recreating my parser to be better usable through a Java API. I need this to be able to create different plugins for java blog (Roller), forum (JForum) and wiki (JAMWiki) software. It would be nice if you would like to test and give feedback (please mail axelclk gmail com if you are interested. A first beta: http://plog4u.org/index.php/Wikipedia_API ).
Note that the above conversation may have been edited or added to since the transfer. If in doubt, check the edit history.
Here is a dump of a early version of some doc User:Djbclark did about a survey of options for offline mediawiki use (Sat Jul 12, 2008) - I don't have time to incorporate it into a nice seperate page at the moment, but I thought it might be useful to some people as is, and also I need to reference it from a mailing list thread :-)
Feel free to edit it / move it to a more appropriate place - the Alternative parsers page seemed like the closest thing to a Using Mediawiki Offline page at the moment. --Djbclark July 2008
mvs - A command line Mediawiki client
editIt would be really nice if this supported some form of recursion... All these tools are way to "you are only going to use this with wikipedia, so we can't possibly provide features that would be useful for smaller wikis" oriented...
Basic Use
editInstall:
sudo aptitude install libwww-mediawiki-client-perl
Initial Setup:
mvs login -d your.mediawiki.server.hostname -u USERNAME -p 'PASSWORD' -w '/wiki'
Where USERNAME is your username (note that mediawiki autocapitalizes this, so for example this would be Dclark, not dclark) and PASSWORD is your mediawiki password (note that this is a very insecure way to pass a password to a program, and should only be used on systems where you are the only user or you trust all other users).
Example Checkout:
mvs update User:Dclark.wiki
See Also
edit- User:Mark/WWW-Mediawiki-Client - mvs author's page on mvs
- CPAN > Mark Jaroski > WWW-Mediawiki-Client > mvs
- wiki page batch update by mvs perl module - has useful looking mvs makefile
Flat Mirror of Entire Wiki
editGoogle Gears
editIf you have Google Gears (BSD Licensed) installed, you will see a "gears localserver" box on the lower left-hand side of the cluestock mediawiki screen, under the "navigation", "search", and "toolbox" boxes. This is done with the Mediawiki LocalServer: Offline with Google Gears extention. The original version provides slightly more clear install doc. In general, put the .js files with the other .js files, in the common skins directory.
After creating the local store and waiting for it to finish downloading, you will be able to go offline and browse the wiki - however search and "Special:" pages will not work in Google Gears offline mode, and you will not be able to edit pages in offline mode.
Local Django-based server
editThe directions at Building a (fast) Wikipedia offline reader produce an environment that takes more time to set up than Google Gears, but is arguably a bit nicer (including local search of page titles - and shouldn't be that hard to extend that to full text).
# THESE ARE NOT STEP-BY-STEP INSTRUCTIONS... interpretation is required. sudo aptitude install apt-xapian-index xapian-tools libxapian-dev php5-cli wget http://users.softlab.ece.ntua.gr/~ttsiod/mediawiki_sa.tar.bz2 wget http://users.softlab.ece.ntua.gr/~ttsiod/offline.wikipedia.tar.bz2 populate wiki-splits with raw .xml.bz2 dump mv mediawiki_sa offline.wikipedia Edit Makefile to have line "XMLBZ2 = PICKNAME-articles.xml.bz2" Edit mywiki/gui/view.py 4th line to: return article(request, "Main Page") make wikipedia # (Then follow directions it spews)
TODO: Set up cron job to produce rsync-able PICKNAME-articles.xml.bz2 on a regular basis. Package this up.
PDF Book
editYou can download entire parts of the wiki as PDF books.
This can be done with Extension:Pdf_Book plus the Whole Namespace Export patch.
Tried / Don't work or no doc
editSome useful doc on how to make perl and python modules into debian packages however...
libmediawiki-spider-perl
editCPAN > Emma Tonkin > Mediawiki-Spider-0.31 > Mediawiki::Spider
sudo aptitude install dh-make-perl fakeroot dpkg-dev build-essential sudo aptitude install libwww-perl libhtml-tree-perl libhtml-tree-perl libhtml-tree-perl sudo apt-file update wget http://search.cpan.org/CPAN/authors/id/C/CS/CSELT/HTML-Extract-0.25.tar.gz tar -pzxvf HTML-Extract-0.25.tar.gz dh-make-perl HTML-Extract-0.25 cd HTML-Extract-0.25 fakeroot dpkg-buildpackage -uc -us cd .. sudo dpkg -i libhtml-extract-perl_0.25-1_all.deb wget http://search.cpan.org/CPAN/authors/id/C/CS/CSELT/Mediawiki-Spider-0.31.tar.gz tar -pzxvf Mediawiki-Spider-0.31.tar.gz dh-make-perl Mediawiki-Spider-0.31 cd Mediawiki-Spider-0.31 fakeroot dpkg-buildpackage -uc -us cd .. sudo dpkg -i libmediawiki-spider-perl_0.31-1_all.deb
You need a script like this to use it:
#!/usr/bin/env perl use Mediawiki::Spider; use Data::Dumper; my $spider2=new Mediawiki::Spider; print "Now getting wikiwords\n"; my @wikiwords2=$spider2->getwikiwords("http://standards-catalogue.ukoln.ac.uk/"); $spider2->extension("html"); print "Got wikiwords:proceeding with d/l\n"; $spider2->makeflatpages("./$destinationdir/",1); $spider2->buildmenu(); $spider2->printmenu("./$destinationdir/index.html","aword",@wikiwords);
However it only seems to work with older versions of mediawiki (or our mediawiki instance is "weird" in some way it doesn't expect).
fuse-mediawiki
editMediawiki FUSE filesystem: git clone git://repo.or.cz/fuse-mediawiki.git
sudo aptitude install git-core gvfs-fuse fuse-utils fuse-module python-fuse git clone git://repo.or.cz/fuse-mediawiki.git cd fuse-mediawiki.git mkdir yourwiki-fuse python fuse-mediawiki.py -u Dclark http://your.wiki.hostname/wiki yourwiki-fuse
This works, but brings up a nonsense file system that you can't cd into beyond one level or ls in. It seems to be under active development, so probably good to check back in a few months.
Note that upstream (one developer) has moved on to a somewhat similar but non-FUSE project. [1]
wikipediafs
editWikipediaFS - View and edit Wikipedia articles as if they were real files
sudo aptitude install gvfs-fuse fuse-utils fuse-module python-fuse python-all-dev sudo easy_install stdeb wget http://internap.dl.sourceforge.net/sourceforge/wikipediafs/ tar xvfz wikipediafs-0.3.tar.gz cd wikipediafs-0.3 vi setup.py # Edit so version is correct stdeb_run_setup cd deb_dist/wikipediafs-0.3/ dpkg-buildpackage -rfakeroot -uc -us sudo dpkg -i ../python-wikipediafs_0.3-1_all.deb man mount.wikipediafs
This is sort of useless for the purpose of this section, as it requires the user to get a specific set of pages before going offline. Didn't spend enough time with it to see if it worked as advertised.
wikipediaDumpReader
editWikipedia Dump Reader - KDE App - Reads output of dumpBackup.php
cd /usr/share/mediawiki/maintenance php dumpBackup.php --current | bzip2 > PICKNAME-articles.xml.bz2
Too wikipedia-specific. Didn't work with our internal dump at all.
Kiwix
editKiwix (https://kiwix.org) is an offline reader especially thought to make Wikipedia available offline. This is done by reading the content of the project stored in a file format ZIM (see https://openzim.org), a high compressed open format with additional meta-data.
- Pure ZIM reader
- case and diacritics insensitve full text search engine
- Bookmarks & Notes
- kiwix-serve: ZIM HTTP server
- PDF/HTML export
- Multilingual
- Search suggestions
- Zim index capacity
- Support for Linux / Windows
- Solutions to build DVD with Windows Installer and DVD launcher (autorun)
- Tabs
More:
- Web site: https://www.kiwix.org
- Contact: #kiwix on freenode join
- Ubuntu PPA: sudo add-apt-repository ppa:kiwixteam/ppa ; sudo apt-get update ; sudo apt-get install kiwix
- svn co https://kiwix.svn.sourceforge.net/svnroot/kiwix kiwix
- Compilation documentation: http://kiwix.svn.sourceforge.net/viewvc/kiwix/moulinkiwix/COMPILE
See Also
edit- Wikipedia Command_line_tools page
- The Wikireader Archives - Mailing list for discussing wiki readers.
- WikiBrowse: a self-contained wiki server (OLPC XO / Sugar specific)
- Wikislices: collections of articles pulled from a MediaWiki for WikiBrowse (OLPC XO / Sugar specific)
- mwlib – Python library for parsing MediaWiki articles
- Offline web apps emerging standard and demo as implemented with Firefox 3 Offline resources.
- Mediawiki Wiki on a stick
- [Wikireader] Because I'm to impatient to wait to see SJ at home...
- Mediawiki Alternative parsers
- http://wikitext.rubyforge.org/ The wikitext extension is a fast wikitext-to-HTML translator written in C and packaged as a Ruby extension.