Mediawiki-utilities/List
Datasource
editmwxml -- XML dump processing
editThis library contains a collection of utilities for efficiently processing MediaWiki’s XML database dumps. There are two important concerns that this module intends to address: complexity and performance of streaming XML parsing.
- Complexity
- Streaming XML parsing is gross. XML dumps consist of (1) some site meta data, (2) a collection of pages that contain (3) collections of revisions. The module allows you to think about dump files in this way and ignore the fact that you’re streaming XML. A mwxml.Dump contains a mwxml.SiteInfo and an iterator of mwxml.Page‘s. A mwxml.Page contains page metadata and an iterator of mwxml.Revision‘s. A mwxml.Revision contains revision metadata and text.
- Performance
- Performance is a serious concern when processing large database XML dumps. Regretfully, python’s Global Intepreter Lock prevents us from running threads on multiple CPUs. This library provides mwxml.map(), a function that maps a dump processing over a set of dump files using multiprocessing to distribute the work over multiple CPUS
See also dumps.wikimedia.org, Special:Export, and Manual:DumpBackup.php.
mwapi -- API querying and session management
editThis library provides a set of basic utilities for interacting with MediaWiki’s “action” API – usually available at /w/api.php. The most salient feature of this library is the mwapi.Session class that provides a connection session that sustains a logged-in user status and provides convenience functions for calling the MediaWiki API. See get() and post().
- Authentication
- mwapi.Session provides convenient login() and logout() methods
mwdb -- Database connection and querying
editpip install mwdb
• source
This library provides a set of utilities for connecting to and querying a MediaWiki database.
Authentication & authorization
editmwoauth -- OAuth connection handler for MediaWiki
editThis library provide a simple means to performing an OAuth handshake with a MediaWiki installation with the OAuth Extension installed.
Data processing
editmwdiffs -- Revision diff processing
editThis library provides a set of utilities for generating information about the difference between revisions.
mwreverts -- Revert detection
editThis library provides a set of utilities for detecting reverts (see mwreverts.Detector and mwreverts.detect()) and identifying the reverted status of edits to a MediaWiki wiki.
See also m:R:Revert detection.
mwsessions -- Edit session processing
editThis library provides a set of utilities for group MediaWiki user actions into sessions. mwsessions.Sessionizer and mwsessions.sessionize() can be used by python scripts to group activities into sessions or the command line utilities can be used to operate directly on data files. Such methods have been used to measure editor labor hours[1].
See m:R:Activity session.
mwpersistence -- Content persistence processing
editThis library provides a set of utilities for measuring content persistence and tracking authorship in MediaWiki revisions.
See also m:R:Content persistence.
mwparserfromhell -- Easy-to-use parser for wikitext
editThis library provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode.
Basic utilities
editmwtypes -- A basic type system for MediaWiki data
editThis library provides a set of standardized types to be used when processing MediaWiki data. All of the types in this package make use of jsonable and therefore can be trivially serialized as JSON documents.
mwcli -- Utilities for unix command-line data processing
editpip install mwcli
• source
Incubator
editThese libraries are experimental and may change dramatically or be discontinued.
mwmetrics -- A collection of statistics and measurements for MediaWiki
editmwevents -- A generalized event extraction and processing framework
editpip install mwevents
• source
- ↑ Using Edit Session to Measure Participation in Wikipedia R. Stuart Geiger & Aaron Halfaker. (2013). CSCW (pp. 861-870) DOI:10.1145/2441776.2441873.