Mediawiki-utilities/List

Datasource

edit

mwxml -- XML dump processing

edit
pip install mwxml • docs • source

This library contains a collection of utilities for efficiently processing MediaWiki’s XML database dumps. There are two important concerns that this module intends to address: complexity and performance of streaming XML parsing.

Complexity
Streaming XML parsing is gross. XML dumps consist of (1) some site meta data, (2) a collection of pages that contain (3) collections of revisions. The module allows you to think about dump files in this way and ignore the fact that you’re streaming XML. A mwxml.Dump contains a mwxml.SiteInfo and an iterator of mwxml.Page‘s. A mwxml.Page contains page metadata and an iterator of mwxml.Revision‘s. A mwxml.Revision contains revision metadata and text.
Performance
Performance is a serious concern when processing large database XML dumps. Regretfully, python’s Global Intepreter Lock prevents us from running threads on multiple CPUs. This library provides mwxml.map(), a function that maps a dump processing over a set of dump files using multiprocessing to distribute the work over multiple CPUS

See also dumps.wikimedia.org, Special:Export, and Manual:DumpBackup.php.

mwapi -- API querying and session management

edit
pip install mwapi • docs • source

This library provides a set of basic utilities for interacting with MediaWiki’s “action” API – usually available at /w/api.php. The most salient feature of this library is the mwapi.Session class that provides a connection session that sustains a logged-in user status and provides convenience functions for calling the MediaWiki API. See get() and post().

Authentication
mwapi.Session provides convenient login() and logout() methods

See also API and w/api.php.

mwdb -- Database connection and querying

edit
pip install mwdb • source

This library provides a set of utilities for connecting to and querying a MediaWiki database.

Authentication & authorization

edit

mwoauth -- OAuth connection handler for MediaWiki

edit
pip install mwoauth • docs • source

This library provide a simple means to performing an OAuth handshake with a MediaWiki installation with the OAuth Extension installed.

Data processing

edit

mwdiffs -- Revision diff processing

edit
pip install mwdiffs • docs • source

This library provides a set of utilities for generating information about the difference between revisions.

mwreverts -- Revert detection

edit
pip install mwreverts • docs • source

This library provides a set of utilities for detecting reverts (see mwreverts.Detector and mwreverts.detect()) and identifying the reverted status of edits to a MediaWiki wiki.

See also m:R:Revert detection.

mwsessions -- Edit session processing

edit
pip install mwsessions • docs • source

This library provides a set of utilities for group MediaWiki user actions into sessions. mwsessions.Sessionizer and mwsessions.sessionize() can be used by python scripts to group activities into sessions or the command line utilities can be used to operate directly on data files. Such methods have been used to measure editor labor hours[1].

See m:R:Activity session.

mwpersistence -- Content persistence processing

edit
pip install mwpersistence • docs • source

This library provides a set of utilities for measuring content persistence and tracking authorship in MediaWiki revisions.

See also m:R:Content persistence.

mwparserfromhell -- Easy-to-use parser for wikitext

edit
pip install mwparserfromhell • docs • source

This library provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode.


Basic utilities

edit

mwtypes -- A basic type system for MediaWiki data

edit
pip install mwtypes • docs • source

This library provides a set of standardized types to be used when processing MediaWiki data. All of the types in this package make use of jsonable and therefore can be trivially serialized as JSON documents.

mwcli -- Utilities for unix command-line data processing

edit
pip install mwcli • source

Incubator

edit

These libraries are experimental and may change dramatically or be discontinued.

mwmetrics -- A collection of statistics and measurements for MediaWiki

edit
source

mwevents -- A generalized event extraction and processing framework

edit
pip install mwevents • source
  1. Using Edit Session to Measure Participation in Wikipedia R. Stuart Geiger & Aaron Halfaker. (2013). CSCW (pp. 861-870) DOI:10.1145/2441776.2441873.