Topic on Talk:Parsing/Replacing Tidy

Categories need to be fully populated before Tidy can be discontinued

5
Summary by SSastry (WMF)

This has been done.

Jonesey95 (talkcontribs)

I know I keep harping on this, but now that Tech News has announced that Tidy will be going away in 2017, the "Pages using invalid self-closed HTML tags" categories really do need to be fully populated on all wikis. The category was added to MediaWiki in July 2016, and it has still not fully populated. For example, on te.wikisource.org, there is only one page, "మూస:Transform-rotate", in the error category at this writing, but that page doesn't even have a problem – the problem is in a page that is transcluded on that page. That transcluded page is not yet in the error category (after seven months). This is task T132467.

How can we get an accurate list of all pages that need to be fixed?

See also task T106685 (insource searches don't work right), which is another bug that makes it more difficult to migrate away from Tidy. Let me know how I can help.

IKhitron (talkcontribs)

You can do the same as I did - take a month (for enwiki it will be a decade), and run nulledit bot on all pages.

Jonesey95 (talkcontribs)

Get a null-edit bot approved and running on 868 different wikis? That's outside of my scope.

If WMF wants to retire Tidy, WMF needs to null-edit all pages on all wikis or otherwise fix the conditions that prevent the categories from being fully populated. If that does not happen, it seems to me that Tidy's retirement will not be able to occur.

IKhitron (talkcontribs)

Yap. And 902 wikis.

SSastry (WMF) (talkcontribs)

@Jonesey95, we discussed this recently and @Legoktm updated T132467#3004685. We'll track this. But, note that we have backward compatibility fixes in the parser for self-closed tags, so, it is not catastrophic to not be able to fix all those self-closing tags before Tidy is removed. We'll eventually remove the compatibility fix once we are sure it is tackled. Meanwhile, we are also looking at how to ensure that pages are refreshed in a fixed time frame (as discussed in that phab task link above).