Html5Depurate/zh
此页面已过时。其仅作为历史记录保留。 它可能记载废弃的和/或不再支持的扩展或功能。 不要认为这里的信息是最新的。 In 2017/2018, Html5Depurate was proposed as a potential replacement for Tidy in MediaWiki, but Wikimedia and MediaWiki moved from Tidy to RemexHtml instead of Html5Depurate. |
Html5Depurate是一个Web服务以可能无效的HTML作为输入,分析它使用HTML5的解析算法,并输出结果文件使用XHTML序列化。 这是用Java写的,所以它可以通过Henri Sivonen和Mozilla基金会用优秀的validator.nu分析器。
在第三方用户,考虑RemexHTML,PHP只有HTML5解析库已被用来代替作为一个整洁而不是HTML5Depurate置换的依据。
Package installation
Packages for Ubuntu Trusty and Debian Jessie are available from apt.wikimedia.org. These can be installed as follows. For Jessie:
apt-get install apt-transport-https
echo deb https://apt.wikimedia.org/wikimedia/ jessie-wikimedia main >> /etc/apt/sources.list.d/wikimedia.list
apt-get update
apt-get install html5depurate
For Trusty:
apt-get install apt-transport-https
echo deb https://apt.wikimedia.org/wikimedia/ trusty-wikimedia main >> /etc/apt/sources.list.d/wikimedia.list
apt-get update
apt-get install html5depurate
The service will automatically start on localhost:4339. The package is reasonably secure, since it sets up a new unprivileged user for the daemon, and uses a very restrictive Java security policy.
Note that the package uses Maven Central during its build process, so the source package does not contain all the relevant source files.
Compilation
The source can be obtained with:
git clone https://gerrit.wikimedia.org/r/mediawiki/services/html5depurate
Install Maven, JDK 7 and jsvc. Compile using:
mvn package
This will download all dependencies from Maven Central, compile, test, and generate a single .jar file which bundles all dependencies. The jar file will appear in the target directory, with a filename that depends on the current version. For testing as a foreground process, you can use something like:
java -cp target/html5depurate-1.1-SNAPSHOT.jar \
org.wikimedia.html5depurate.DepurateDaemon
To run it in the background, you can use jsvc, for example:
/usr/bin/jsvc \
-cp $PWD/target/html5depurate-1.1-SNAPSHOT.jar \
-pidfile /tmp/html5depurate.pid \
-errfile /tmp/html5depurate.err \
-outfile /tmp/html5depurate.out \
-procname html5depurate \
org.wikimedia.html5depurate.DepurateDaemon
Or check out the debian
branch for fully baked SysV init scripts.
Configuration
Configuration options may be specified in /etc/html5depurate/html5depurate.conf. Possible configuration options and their default values are documented below:
# Max POST size, in bytes. maxPostSize = 100000000 # Host or IP and port on which Html5depurate will listen. host = localhost port = 4339
It's advisable to also configure Java's logging service. For example, the Debian package uses the following logging.properties file:
handlers = java.util.logging.FileHandler .level = INFO java.util.logging.FileHandler.pattern = /var/log/html5depurate/html5depurate.log java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter java.util.logging.FileHandler.append = true java.util.logging.SimpleFormatter.format = %1$tF %1$tT %4$s: %5$s %6$s%n
Then run Java with -Djava.util.logging.config.file=/path/to/logging.properties
Client configuration
MediaWiki can be configured to use this service by putting the following in LocalSettings.php:
$wgUseTidy = false;
$wgTidyConfig = array(
'driver' => 'Html5Depurate'
);
To instruct Html5Depurate to provide backwards compatibility with Tidy as far as is possible, use the compat/document API endpoint:
$wgTidyConfig['url'] = 'http://localhost:4339/compat/document';