从MediaWiki发布
MediaWiki是一个很好的协作文档编写工具,但它并不一定能以适当的格式为您提供已完成的文档,以便在wiki上下文之外使用。
本页面探讨了以适合使用其他媒体发布的方式提取MediaWiki内容的最佳方法。
如今,非数字格式通常是从数字的源材料创建的,因此这个问题很大程度上归结为“我可以在哪些格式中提取数据”。
Types of content you may want to extract
There are generally four types of data that you may wish to publish from MediaWiki:
- Individual pages
- Collections of pages
- Individual media files (e.g. images)
- Collections of media files
In the case of the latter two, these will not normally be created collaboratively on the wiki, but the wiki may have been used to collate the files from various sources. However, manipulating that file outside of MediaWiki is likely to give you the best results, whatever other medium you plan to publish in. In cases where an individual image/file is required, simpy go to the file's description page and download the original from there. In cases where you want to download multiple files, follow the instructions on exporting all the files of a wiki, but filter the file list so it just contains the files you want.
The rest of this page therefore focuses on the first two items: individual pages and collections of pages.
Built-in methods of exporting data via the interface
- You can export the HTML content of a page by appending
?action=render
to the URL, like this. This outputs just the rendered HTML content of the page, without any of the MediaWiki skin elements. Note that it is not a valid HTML page, but a page fragment, and does not include any CSS styling. - You can export one or more pages using Special:Export. This will give you the raw wikitext wrapped up in an XML structure. You will need to do further processing in order for this output to be useful.
- You should be able to extract pages using the API.
Built-in methods of exporting data via the command-line
/maintenance/getText.php
allows you to get the wiki text for a specific page.- As a Hack, the following command will output page html (make sure to run it in your maintenance directory. Replace Main_Page with the page you want)
echo '$a = new ApiMain( new FauxRequest( array( "action" => "parse", "page" => "Main_Page", "prop" => "text" ))); $a->execute(); $d = $a->getResultData(); echo $d["parse"]["text"]["*"];'|php eval.php
- The above could be replaced by a proper maintenance script if there is demand (similar to getText.php for page text).
Extensions to help with exporting data
This list is not by any means exhaustive, nor should it be considered a recommendation to use any of these extensions. It is more a pointer to some extensions that may be worth investigating further.
- There are various extensions which you can install that allow exporting of individual pages as PDF files:
- Extension:Collection allows you to publish individual pages or collections of pages in a number of formats.
- Extension:OpenDocument Export exports in ODF format.
- 分类:数据提取扩展 is currently a bit of a mixed bag, but contains some useful items not already covered by the above.