HtmlFormatter

HtmlFormatter is a PHP library for transforming or filtering MediaWiki HTML output.

It originated from the MobileFrontend extensions and is designed to transform HTML produced by MediaWiki. It is not intended for general use on arbitrary HTML5 input, but may work well for other use cases.

Wikimedia uses this library in the following MediaWiki extensions:

Usage

edit
use HtmlFormatter\HtmlFormatter;
// Load HTML that already has doctype and stuff
$formatter = new HtmlFormatter( $html );

// ...or one that doesn't have it
$formatter = new HtmlFormatter( HtmlFormatter::wrapHTML( $html ) );

// Add rules to remove some stuff
$formatter->remove( 'img' );
$formatter->remove( [ '.some_css_class', '#some_id', 'div.some_other_class' ] );
// Only the above syntax is supported, not full CSS/jQuery selectors

// These tags get replaced with their inner HTML,
// e.g. <tag>foo</tag> --> foo
// Only tag names are supported here
$formatter->flatten( 'span' );
$formatter->flatten( [ 'code', 'pre' ] );

// Actually perform the removals
$formatter->filterContent();

// Direct DomDocument manipulations are possible
$formatter->getDoc()->createElement( 'p', 'Appended paragraph' );

// Get resulting HTML
$processedHtml = $formatter->getText();
edit