Extension talk:TextExtracts/Flow

About this board

Previous page history was archived for backup purposes at Extension talk:TextExtracts/LQT Archive 1 on 2015-06-25.

Adjust the way <br> and variants are handled

One comment • 09:11, 10 October 2024 2 months ago

1

Spiros71 (talkcontribs)

Currently if there is a BR tag in wikitext, it is erased completely and the end result is the last word before the tag and the first after it getting joined. Is there a way to modify this behaviour, ie. by replacing BR tags with a space?

image link

Reply Edited 09:11, 10 October 2024 2 months ago

Reply to "Adjust the way <br> and variants are handled"

Shorten MathML content in explaintext mode

3 comments • 07:26, 26 April 2024 7 months ago

3

143.179.2.139 (talkcontribs)

Example: https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=Digital%20biquad%20filter

The original page's html tags have been stripped, but were replaced with whitespace. This results in contents that are difficult to read and/or shorten in any easy way. Ideally, MathML contents would be shortened to a single expression, where any tags and whitespace between tags should be removed, leaving only the raw expression.

Reply 18:35, 25 April 2024 7 months ago

143.179.2.139 (talkcontribs)

Or rather, the alttext attribute could be used for any HTML tags that provide it.

Reply 18:36, 25 April 2024 7 months ago

Thiemo Kreuz (WMDE) (talkcontribs)

As of now the TextExtracts extension has no knowledge about the Math extension. This is a reoccurring problem with many extensions, I'm afraid.

May I ask what the use case even is? Where does this text appear? I know the team around @Physikerwelt worked on Popups support for Math. But that skips TextExtracts entirely, as far as I know.

Reply 07:26, 26 April 2024 7 months ago

Reply to "Shorten MathML content in explaintext mode"

Popups/TextExtracts 1.39

2 comments • 11:10, 23 April 2024 7 months ago

2

2003:C2:3F21:FD00:A1BE:8BA5:2092:924F (talkcontribs)

Mediawiki 1.39.6, PHP 7.4.3, MySQL 8.0.36

Prior to 1.39.6, the PagePreview/Popup/TextExtracts either showed some text from the target article or it showed "..." The ellipsis always occured when there was _no text before the first heading_. But, if there was an associated image, the preview showed "..." on the left-hand side and the image on the right-hand.

Now with the upgrade, the preview is "Es gab ein Problem bei der Anzeige dieser Vorschau" / problems displaying the preview. No image being displayed.

How can we regain the previous behaviour?

09:15, 18 April 2024 8 months ago

2003:C2:3F21:FD00:134:BD68:4409:6542 (talkcontribs)

Topic can be closed, wrong place. See Topic:Y3eq158cl5otcgdt instead.

10:06, 23 April 2024 7 months ago

prop=extracts not working and send back Error 500

5 comments • 06:33, 6 April 2024 8 months ago

5

DAVY2018 (talkcontribs)

As the title mentioned, when I try to use Popups with TextExtract, the Popups often shows "There was issues displayding this preview".

When I check by using Chrome's function to check the code and console, it shows that there is a "500 Internal Server Error".

I have tried using API Sandbox to test every part of the api, and discover that once prop=extracts part was put in the api, it will send back Error Code 500. But when it was removed, no error will be given and the output remains normal.

Is there reason why this situation would happen and is there any possible ways to solve it?

P.S I have set short URL by apache2 according to the Tutorial in Mediawiki, while api.php is accessible and have no problem to access at all.

Reply Edited 14:45, 5 April 2024 8 months ago

Thiemo Kreuz (WMDE) (talkcontribs)

Is this question about a self-hosted wiki? An error 500 could be anything. You would need to find the responsible error message in your server's log files. Manual:How to debug might help.

Reply 16:03, 5 April 2024 8 months ago

DAVY2018 (talkcontribs)

Yes, the wiki is a self-hosted wiki.

Thank you for your advice and I will try to figure it out by log files.

Reply 04:57, 6 April 2024 8 months ago

DAVY2018 (talkcontribs)

After debugging, it shows the following lines:

Fatal error: Declaration of TextExtracts\ExtractFormatter::onHtmlReady(string $html): string must be compatible with HtmlFormatter\HtmlFormatter::onHtmlReady($html) in /var/www/<my wiki name>/w/extensions/TextExtracts/includes/ExtractFormatter.php on line 66

Does this means that the extension php is having error?

Reply 06:01, 6 April 2024 8 months ago

DAVY2018 (talkcontribs)

Problem Solved after executing "composer require wikimedia/html-formatter".

Reply 06:33, 6 April 2024 8 months ago

Reply to "prop=extracts not working and send back Error 500"

No text extraction in SMW 4.02 + MW 1.39

11 comments • 00:00, 6 April 2024 8 months ago

11

Lotusccong (talkcontribs)

When I run this script https://www.tbpedia.org/api.php?action=query&prop=extracts&exchars=1000&titles=%E9%A6%96%E9%A0%81

It show the below extration message

{ "batchcomplete": "", "warnings": { "extracts": { "*": "HTML may be malformed and/or unbalanced and may omit inline images. Use at your own risk. Known problems are listed at https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:TextExtracts#Caveats." } }, "query": { "pages": { "1": { "pageid": 1, "ns": 0, "title": "\u9996\u9801", "extract": "\n" } } } }

It seems that no text had been extract .

When I use Popups Extension, it will showed " There was issues displayding this preview:.

Reply 10:41, 29 December 2022 1 year ago

Joe Beaudoin Jr. Redux (talkcontribs)

Did you ever solve this?
The above link only shows the NewPP limit report commented-out text as an extract, which would explain the "Issues displaying this preview" error:

{

"batchcomplete": "",

"warnings": {

"extracts": {

"*": "HTML may be malformed and/or unbalanced and may omit inline images. Use at your own risk. Known problems are listed at https://www.mediawiki.org/wiki/Special:MyLanguage/Extension:TextExtracts#Caveats."

}

},

"query": {

"pages": {

"1": {

"pageid": 1,

"ns": 0,

"title": "\u9996\u9801",

"extract": "\n"

}

Reply Edited 14:58, 24 March 2023 1 year ago

Lotusccong (talkcontribs)

Hi Joe,

The issues not resolved. What does this means "NewPP limit report commented-out text as an extrac" ?

Reply 15:44, 9 April 2023 1 year ago

Lotusccong (talkcontribs)

Hi Joe,

If you access to this link https://www.tbpedia.org/w/api.php?action=query&prop=extracts&exchars=1000&titles=%E9%A6%96%E9%A0%81

You will notice that the extract only showed the NewPP limit report commented. This cause the Popups extension said "There was an issues displaying this preview". See from here https://www.tbpedia.org/wiki/%E7%9B%A7%E5%8B%9D%E5%BD%A5%E6%96%87%E9%9B%86%E7%BF%BB%E8%AD%AF%E7%B6%AD%E5%9F%BA%E9%A4%A8

I reinstalled the MW with 1.39.3, PHP 8.0.28, SMW 4.1.1 , TextExtracts – (74baaa7) 17:23, 20 March 2023 , Previews – (010237d) 15:23, 21 March 2023, PageInages – (78537e6) 15:23, 21 March 2023 .

I am using the Short URL as well.

Initiately , the previews was working fine. Buy after I installed more Extensions until one of it ( Can't figure it whicj one), it caused this error. I removed the installed extensions the error still persist.

I suspected may be one of the extension that I installed with Composer has screwup the library ? Or there is a conflict if Javascripts ?

I thought if the issues is caused by the conflict of extensions, I just removed installed extension one by one but it doesn't work even I have removed it ( not load it from LocalSettings.php).

If the preview issues is caused by Popups extensions, then the Text extract API should be working.

I have enable the debug toolbar for easy troubleshooting.

Really apperciate if anyone can help to troubleshoot this issues.

Thanks in advanced.

Reply Edited 14:38, 11 April 2023 1 year ago

Lotusccong (talkcontribs)

Today, when I check on the page 盧勝彥文集翻譯維基館 - 真佛百科 True Buddha Pedia (tbpedia.org) , Item 3 & 5 can showed the preview but not item 2 and 4.

This is really a puzzle to me why a day ago all 4 links can't show the preview, now can only show two out of four ?

Any clues what is went wrong ? It is due to cache ? Due to the page content ?

5 minutes later, All the links can't show the preview. This created more confusion for me. What is the root cause of not display the preview ? I didn;t make any changes on the configuration.

Reply Edited 04:05, 12 April 2023 1 year ago

DAVY2018 (talkcontribs)

Sorry to reopen this old talk but I want to ask if there are any solution on this question?

Reply 07:06, 5 April 2024 8 months ago

Lotusccong (talkcontribs)

Basesd on my case, it seems that there is nothing wrong with TextExtracts or Popups . I notice that TextExtract will not extract any artcile that beging with heading . You need to have some text before the heading.

Reply 12:23, 5 April 2024 8 months ago

DAVY2018 (talkcontribs)

Yes TextExtract did not extract any article with heading as beginning. But for me, despite having text before headings, the TextExtract still cannot output anything, while Popups remained showing " There was issues displayding this preview."

That's why I would like to seek help from your past experience and see if it will be useful.

Reply 13:22, 5 April 2024 8 months ago

Lotusccong (talkcontribs)

If you don't mind, pls share the link and I can test it on my wiki site.

Reply 14:26, 5 April 2024 8 months ago

DAVY2018 (talkcontribs)

May I know if what link do you want me to share?

Reply Edited 14:38, 5 April 2024 8 months ago

Lotusccong (talkcontribs)

The page that Popus showed "There was issues displaying this preview."

Reply 00:00, 6 April 2024 8 months ago

Reply to "No text extraction in SMW 4.02 + MW 1.39"

How to remove thumb caption from extracts?

2 comments • 16:04, 5 April 2024 8 months ago

2

Summary by Thiemo Kreuz (WMDE)

Unclear question.

Wess (talkcontribs)

We saw that thumb captions are shown in the extract if an image is in the first paragraph. Is there a way to remove it? tried to add "figure" + "figcaption" (MW 1.40) to wgExtractsRemoveClasses with no success. Manally adding the "noexcerpt" class to the image did work.

08:28, 4 August 2023 1 year ago

Thiemo Kreuz (WMDE) (talkcontribs)

On which wiki does this happen? What version of MediaWiki are you using?

<figure> is already part of the list of elements to remove. Since <figcaption> is inside of <figure> it will be removed as well. Maybe your wiki's configuration modifies $wgExtractsRemoveClasses in an unexpected way? Maybe your $wgParserEnableLegacyMediaDOM configuration changed, but TextExtracts wasn't updated?

12:47, 21 August 2023 1 year ago

How to remove one of default values in $ExtractsRemoveClasses?

2 comments • 15:06, 24 March 2023 1 year ago

2

Radouch (talkcontribs)

I use some kind of layout for pages of my wiki. It means that almost every page begins with the div tag.

Unfortunately, div is among default items in $ExtractsRemoveClasses array (defined in extension.json of this extension). So no text is displayed by Extension:Popups for those pages as content inside div element is ignored by TextExtracts.

I would like to remove div item from $ExtractsRemoveClasses in my LocalSettings.php, but I cannot find the right way to do it. Some ideas, please?

As a workaround, I removed div from extension.json, but I am sure it is a bad practice.

Reply Edited 13:39, 8 February 2022 2 years ago

Joe Beaudoin Jr. Redux (talkcontribs)

Unfortunately, this is the only way to do this at this point... and you need to do it if you use the Citizen skin, as of this writing anyway.

Reply 15:06, 24 March 2023 1 year ago

Reply to "How to remove one of default values in $ExtractsRemoveClasses?"

badtoken error

One comment • 11:14, 28 November 2022 2 years ago

1

Nardog (talkcontribs)

I used to use this API to get excerpts from Wiktionary on Wikipedia in JavaScript, but now (since a few weeks ago perhaps) it returns a "badtoken" error. I can use other APIs on Wiktionary from Wikipedia alright, including Parse, so this is odd.

Reply 11:14, 28 November 2022 2 years ago

Reply to "badtoken error"

$wgExtractsIncludeClasses needed

One comment • 09:31, 20 September 2021 3 years ago

1

Krabina (talkcontribs)

It would be great if there was a parameter $wgExtractsIncludeClasses where classes could be defined that should be included in the text extracts. Often, I use some kind of div with styling informtion also for the first paragraph that will not be included in the extracts. If I want this to work, I always have to start with some plain text, which is quite unflexible.

Reply 09:31, 20 September 2021 3 years ago

Reply to "$wgExtractsIncludeClasses needed"

Return value of template parameter as summary

6 comments • 20:16, 1 June 2021 3 years ago

6

Summary by Jonathan3

No need - it uses parsed wikitext (i.e. HTML of page) - just needed to fix $ExtractsRemoveClasses to get it all to work for my pages.

Jonathan3 (talkcontribs)

How would I go about this? Most pages on my site are created from template calls (using Extension:Cargo) without any other text or headings. So no summary is extracted.

I see that it's possible to create a new API but to be able to do that I'd mostly need to copy an existing one :-)

Edited 14:10, 31 May 2021 3 years ago

Thiemo Kreuz (WMDE) (talkcontribs)

Is this about Popups, or about other usages of the TextExtracts API? It might be possible to customize the existing TextExtracts code so it supports your use-case better. Unfortunately, staff (like me) is probably not able to give a lot of support for customizations like this. If you are able to submit patches that make the TextExtracts extension work better with other extensions like Cargo, making it better for everyone, we can have a look at these patches.

06:50, 1 June 2021 3 years ago

Jonathan3 (talkcontribs)

Thanks again. It's about TextExtracts, which Popups on my site would use (I understand that WMF sites use something else). Cargo is only part of the background information, as my reason for having template-only pages (though it may be that most Cargo websites use it for infoboxes after introductory text, so mine may be a minority interest). If I can work it out I'll submit patches!

07:21, 1 June 2021 3 years ago

Jonathan3 (talkcontribs)

It turned out to be fairly easy. It works fine for me now after I got rid of "div" within "ExtractsRemoveClasses" in TextExtracts's extension.json file. Is there any way of making that change in LocalSettings.php instead?

(Initially I had wrongly assumed the extension looked at the raw wikitext but once I saw it used api.php?action=parse it became clearer...)

Edited 15:32, 1 June 2021 3 years ago

Thiemo Kreuz (WMDE) (talkcontribs)

No easy way, but it's possible:

$wgHooks['MediaWikiServices'][] = function () {
   global $wgExtractsRemoveClasses;
   $wgExtractsRemoveClasses = array_diff( $wgExtractsRemoveClasses, [ 'div' ] );
};

16:02, 1 June 2021 3 years ago

Jonathan3 (talkcontribs)

That seems to work - thanks!

18:54, 1 June 2021 3 years ago