Talk:Requests for comment/Text extraction

About this board

Previous page history was archived for backup purposes at Talk:Requests for comment/Text extraction/LQT Archive 1 on 2015-07-10.

Start a new topic

Some notes

3 comments • 13:13, 18 November 2014 10 years ago

3

Dantman (talkcontribs)

If extracts will be integrated into core custom extraction classes could go to a separate extension (e.g. WikimediaTextExtraction); otherwise they could be part of the main extraction extension.

Personally even if this was implemented in a TextExtraction extension instead of core (though I think it should be implemented in core) I wouldn't want Wikimedia specific stuff in the generic MediaWiki extension. ie: I'd prefer that in both situations WMF would have a WikimediaTextExtraction extesion.

Such timing is less than optimal, I propose to extract text during LinksUpdate and store it in page_props.

page_props is for storage of indexed and queryable data that results from the canonical parse run. ie: Something should only ever be stored there when there is also an equivalent parser cache entry.

page_props is for data you want to be able to query for not for storage. Since you're not going to be making SQL queries trying to match extraction results the extraction data should be stored in the parser cache using either ParserOutput::setExtensionData or adding a new prop + methods to ParserOutput instead.

Alternatively if you want to do this completely separate from the parser cache the proposed DataStore would probably be the best method of storage.

Reply 00:46, 10 November 2013 11 years ago

MaxSem (talkcontribs)

We don't need text extracts in parser output:

I want to make extract retrieval a batch opertaion - it would never be like that if it only came with ParserOutput.
You need to generate an extract once per revision, not on every parse.

Reply 21:56, 14 November 2013 11 years ago

MZMcBride (talkcontribs)

Some wikis, such as Wiktionaries, rely heavily on templates. I'm not sure you can only generate an extract once... if templates change and the resulting page output changes, you'll need to re-generate an extract, right? Plus there will be incremental improvements to the extractor itself, which people will want to benefit from without needing to make dummy edits to pages.

Reply 13:13, 18 November 2014 10 years ago

Reply to "Some notes"

Inherit ALL the things

One comment • 11:31, 18 November 2014 10 years ago

1

Jeroen De Dauw (talkcontribs)

class ExtractFormatter extends HtmlFormatter
class WiktionaryExtractFormatter extends ExtractFormatter

Code reuse via inheritance much? What happened to favoring composition over inheritance?

Reply 11:31, 18 November 2014 10 years ago

Reply to "Inherit ALL the things"

Mobile and Wikidata

One comment • 08:08, 18 November 2014 10 years ago

1

Nemo bis (talkcontribs)

Mobile shows search snippets from Wikidata instead. See mailarchive:wikidata-l/2014-November/005001.html. Maybe this RfC will answer my question on how that feature interacts with TextExtracts? :)

Reply 08:08, 18 November 2014 10 years ago

Reply to "Mobile and Wikidata"

What's this waiting on?

One comment • 00:15, 18 April 2014 10 years ago

1

Sumanah (talkcontribs)

The DataStore RfC has been approved, but not implemented. Is the text extraction RfC awaiting DataStore's implementation?

Reply 00:15, 18 April 2014 10 years ago

Reply to "What's this waiting on?"

Related endeavours

One comment • 23:52, 10 March 2014 10 years ago

1

Quiddity (talkcontribs)

[Nutshell context: There is a recurring ("perennial") proposal at Enwiki and at Meta, to create a "synopsis version" of Wikipedia articles. The most recent is from late 2012, at m:Concise Wikipedia]

I recently took a swing at summarizing everything, from NavPopups to Google Knowledge Graph, as briefly as possible, at m:Concise Wikipedia#A summary of existing short-options, using an example. That info might be relevant to this proposal, or just interesting to some of the folks who are following this.

Reply 23:52, 10 March 2014 10 years ago

Reply to "Related endeavours"

There are no older topics