Topic on Talk:Wikimedia Platform Engineering/MediaWiki Platform team

External auditing of Mediawiki

8
197.218.82.37 (talkcontribs)

A good approach before even starting would be to hire an external firm to do an auditing of Mediawiki as a whole, and report on its strengths and weaknesses. While the team members seem to be all experts in their respective fields, it is always a good thing to get a fresh perspective to avoid tunnel vision.

Malyacko (talkcontribs)

"Strengths and weaknesses" in which area? Is "MediaWiki does not have a kitchen sink" a strength or a weakness? It's unclear which problem your proposal should solve, so you may start by defining a potential problem.

197.218.89.81 (talkcontribs)

>"Strengths and weaknesses" in which area?

The subject page already mentions this, it has had 15 years of development, and many software decisions were made out of desperation, out of a current need, and some were probably just because they could.

>It's unclear which problem your proposal should solve, so you may start by defining a potential problem.

Problem statement

As a user of mediawiki sites it is often the case that one needs to navigate many pages to do basic maintenance or even tasks that should be trivial.

As a user of mediawiki sites it is often the case that the user interface doesn't protect the ignorant from themselves (e.g. interface messages). Once an admin of a non-Wikimedia site blocked themselves by saving javascript they didn't understand.

Specific issues:

1. A good number of core maintenance special pages are redundant

For any wiki those two pages are redundant, all one needs to do to see the oldest pages is look from bottom to top, or vice versa. Easy to kill one of the two, and use a url parameter.

A good number of maintenance reports are also distributed in many pages, instead of having a common navigation. Many just dump a list, they could show all of these using AJAX. This causes multiple problems, while the UI standardization is ongoing it means that there is a messy mix of experiences, performance issues, from a developer's perspective extra code for no good reason.

2. Massive WIkitext feature creep

Magic links, pipe tricking and all those things are truly a relic. Nobody fully understands the {{tag parser function and all its edge cases.

3. Parser functions

There is a well known massive problem with proliferation of parser functions. Meaning that inadvertently Mediawiki developers helped created the world's worse programming language which has bitten them quite a few times. Some of these are even greatly affect site performance and can make the page incredibly slow for readers and editors.

4. Content model issues

There is a long standing "feature" that non-links in all namespaces go into maintenance reports. People innocently adding square brackets to a Javascript page or Module, will later find out that text like "Category:..my variable .. " and things like "indextable + string .." may show up in maintenance reports. It was completly puzzling to see this in a non-wikimedia wiki for the first time. Resulting in waste of database space for no good reason.

5. Categories, Behavioural switches, and page metadata are incredibly complex to manage

Due to an early architectural decision, categories were stored as part of mediawiki page, and they are stored with their labels rather than their IDS. The result is that first they are two ways to define a category (increased maintenance burden to cleanup), secondly something as trivial as renaming or removing a category requires changes to hundreds of pages making cleanup a mess when there is massive category vandalism. It also has performance implications, e.g. cleanup all categories in 1000000 pages may potentially slow down the site for everyone. Categories also

6. Import and Export validation

This particular thing has no proper validation of imports which means that edits can be credited to any random username by messing around with the XML, and this influences reports and logs.

7. Interface messages

There are so many problems with interface messages that make it complete usability nightmare. They lack reasonable any limits to what users can add, it seems that as long as they don't reach the 2MB limit they can stash any amount of content there. They would be bad enough for trained software developers, but this was put in the hands of ignorant users who have no knowledge of software design. For example users regularly abuse interface messages to add huge warnings in wrong places. They also abuse the mechanism to create their own interface with lots of hacks. Lastly, it contains Javascript, CSS, and a whole lot of other things that regularly expose security issues. Wikia, for example, was bitten so many times by this that they locked it down and changed it into a whitelist of explicity approved messages instead of a free for all.

This is just an analysis from an end user's perspective. Someone digging into the code and architecture will find even more flaws.

> Is "MediaWiki does not have a kitchen sink" a strength or a weakness?

A weakness, a jack of all trades is master of none, despite a popular misconception a swiss knife has many more flaws than it has strengths.

http://usabilityhell.com/post/898301158/swiss-army-knife-crap-at-100-things

https://sourcemaking.com/antipatterns/swiss-army-knife

Malyacko (talkcontribs)

Your list looks like a mix of many different areas that are not necessarily connected to each other. Some are bugs, some are feature requests, some are personal opinions which might not be shared by other MediaWiki users. If you have specific and well-defined bugs or functionality requests, specific tasks in Wikimedia Phabricator are welcome (after checking that they don't exist yet).

197.218.80.185 (talkcontribs)

True, most might be a personal opinion and they do have tasks for them perhaps with different language.

However, numbers 5 & 7 are definitely architectural flaws, no two ways about it. They are also well known and filed in phabricator. Number 5 in particular is one of the reasons for the whole MCR thing so it is not my opinion it is also the opinion of some MediaWiki developers, and it took developers 15 years to prioritize and deal with it.

Number 4 is T14019, https://phabricator.wikimedia.org/T61616.

<blockquote> I would prefer if there was a Mediawiki Core/ORM maintainer, and that all unmaintained pieces of code didn't fall in the operations/performance guys side. I fail to see the maintainer of basic mediawiki functionality like this. As much as I love Database engineering, I cannot wear even another hat.

</blockquote>

Emphasis mine.

https://phabricator.wikimedia.org/T6715#1732831

There is now a MediaWiki "core" platform team, but presumably they wont will take the role of an ORM maintainer, assuming the WMF staff member meant https://en.wikipedia.org/wiki/Object-role_modeling.

Tim Starling (talkcontribs)

Hi anonymous user. Please create an account and explain your expertise in this area so that we can have a more nuanced conversation about your needs.

MZMcBride (talkcontribs)

I wonder what the point of going through all the trouble of supporting non-logged in users is, in Flow and in MediaWiki generally, if people are just going to request that you log in.

I also fail to see how it's relevant who the author is when we're discussing, for example, esoteric wikitext markup or potentially redundant special pages.

Tim Starling (talkcontribs)

The point of anonymous editing is for casual contributions, like fixing spelling errors. I've always asked more engaged participants to log in. It's not very practical to work with someone when you can't even send them messages, or recognise them when they use a different ISP or another communication medium.

Reply to "External auditing of Mediawiki"