Open main menu
This page is a translated version of the page Extension:TextExtracts and the translation is 91% complete.

Other languages:
English • ‎dansk • ‎español • ‎occitan • ‎polski • ‎português do Brasil • ‎中文 • ‎日本語 • ‎한국어
MediaWiki扩展手册
OOjs UI icon advanced.svg
TextExtracts
发布状态: 稳定版
实现 API
描述 提供API来导出纯文本或有限HTML的页面提取
作者 Max Semenik (MaxSemtalk)
MediaWiki 1.23+
PHP 5.4+
数据更新
许可协议 GNU General Public License 2.0 or later
下载
  • $wgExtractsRemoveClasses
  • $wgExtractsExtendOpenSearchXml
翻译TextExtracts扩展如果在translatewiki.net可用
检查使用和版本矩阵。
问题 开放的工作 · 报告错误

TextExtracts扩展提供API来提取纯文本或有限HTML(HTML中的一些CSS样式被删除)的页面内容提取。

下载

此扩展可直接从Git检索到 [?]:

  • 浏览代码
  • 部分扩展有稳定版本标签。
  • 每个分支与过去的MediaWiki发布版本相关联。 这里也有一个“主线”分支,包含最新alpha版本(可能需要MediaWiki的alpha版本)。

提取快照,并将它放置在您的MediaWiki安装副本的extensions/TextExtracts/目录中。

如果您对git熟悉,并且拥有您服务器的shell访问权,您也可以通过以下方法获得扩展:

cd extensions/
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/TextExtracts.git

安装

  • 下载文件,并将其放置在您extensions/文件夹中的TextExtracts目录内。
  • 将下列代码放置在您的LocalSettings.php的底部:
    wfLoadExtension( 'TextExtracts' );
    
  •   完成 – 在您的wiki上导航至Special:Version,以验证扩展已成功安装。

致使用MediaWiki 1.26或更早版本的用户:

上面的说明介绍的是安装此扩展的新方法,它使用wfLoadExtension()。 如果您需要在早期版本(MediaWiki 1.26和更早版本)中安装此扩展,而不是wfLoadExtension( 'TextExtracts' );,您需要使用:

require_once "$IP/extensions/TextExtracts/TextExtracts.php";


配置设定

  • $wgExtractsRemoveClasses is an array of <tag>, <tag>.class, .<class>, and #<id> which will be excluded from extraction.
    For example, $wgExtractsRemoveClasses[] = 'dl'; removes indented text, often used for non-templated hatnotes that are not desired in summaries.
    TextExtracts.php defines the defaults, of which the class "noexcerpt" is one - this may be added to any template to exclude it.
  • $wgExtractsExtendOpenSearchXml defines whether TextExtracts should provide its extracts to the opensearch API module. The default is "false".

API


prop=extracts (ex)

(main | query | extracts)

Returns plain-text or limited HTML extracts of the given pages.

Parameters:
exchars

How many characters to return. Actual text returned might be slightly longer.

The value must be between 1 and 1,200.
Type: integer
exsentences

How many sentences to return.

The value must be between 1 and 10.
Type: integer
exlimit

How many extracts to return. (Multiple extracts can only be returned if exintro is set to true.)

No more than 20 (20 for bots) allowed.
Type: integer or max
Default: 20
exintro

Return only content before the first section.

Type: boolean (details)
explaintext

Return extracts as plain text instead of limited HTML.

Type: boolean (details)
exsectionformat

How to format sections in plaintext mode:

plain
No formatting.
wiki
Wikitext-style formatting (== like this ==).
raw
This module's internal representation (section titles prefixed with <ASCII 1><ASCII 2><section level><ASCII 2><ASCII 1>).
One of the following values: plain, wiki, raw
Default: wiki
excontinue

When more results are available, use this to continue.

Type: integer

另一个例子

Caveats

There are various things to be aware of when using the API

  • We do not recommend the usage of `exsentences`. It does not work for HTML extracts and there are many edge cases for which it doesn't exist. For example "Arm. gen. Ing. John Smith was a soldier." will be treated as 4 sentences. We do not plan to fix this.
  • Inline images are stripped from the response (even in HTML mode). This means if you are using the Math extension and using formulae in your lead section they may not appear in the summary output.
  • In HTML mode we cannot guarantee well formed HTML. Resulting HTML may be invalid or malformed.
  • In plaintext mode:
    • citations may not be stripped (see phab:T197266)
    • if a paragraph ends with an HTML tag e.g. ref tag, new lines may be dropped (see phab:T201946).
    • new lines may be dropped after lists phab:T208132

FAQ

How can I remove content from a page preview/extract?

TextExtracts will strip any element that is marked with the class noexcerpt. This is provided by the global wgExtractsRemoveClasses.


参见