Topic on Project:Support desk

1.17, 1.18, 1.19, 1.20, 1.21 Critical bug - lack of translation of page name to suitable encoding - whole site is broken by update scripts (not updated)

Chameleon Red One (talkcontribs)

I found simple bug with page title encoding for which not allow to migrate mediawiki and there is no workaround for it. No idea how to do migration.

  1. 'Strona główna' from 1.16 is not translated into 1.17 'Strona główna' what is probably required (Polish 'main page').
  2. This cause that all page with native characters is broken.
  3. Same issue is in 1.17, 1.18, 1.19, 1.20, 1.21 not idea why this critical bug is not solved.
  4. I have found any workaround for it - all workarounds was tested except 'iconv' but it will not work since can replace also binary data.
    1. (not works)
    2. (no information)
    3. (no information)
    4. (risky conversion)

Why characters is broken after migration form 1.16 into 1.17-1.21? Where it is explained in upgrading or release notes - if explained? Please help with this issue.

My versions before migration:

Nazwa Wersja MediaWiki 1.16.5 PHP 5.2.17 (isapi) MySQL 5.0.83-community-nt

Ciencia Al Poder (talkcontribs)

You should check first what is the collation of your tables in the database. Look at the Collation column after issuing a show table status; command in the mysql command prompt on the wiki database (or use phpmyadmin to look at this information).

The recommended collation is "binary", although utf8 should work. If you're using any other collation a conversion may be required.

Did you upgrade from a very old MediaWiki version before 1.17?

Chameleon Red One (talkcontribs)

Current version of MediaWiki is 1.16.5.

My 'page' collocation is utf8-default collation and 'page_title' is utf8-.

As far as know database - collation is only for sorting - storage is determined by encoding.

Whatever 'Strona główna' is stored as 'Strona główna' - what type of encoding is used for it - it is decoded correctly since MediaWiki transform with magic code 'Strona główna' into 'Strona główna' (what is valid).

How to convert 'Strona główna' to 'Strona główna' inside database?

AKlapper (WMF) (talkcontribs)
Chameleon Red One (talkcontribs)

It is different problem I think all fields are set to utf8 - but 'Strona główna' is not utf8 - utf8 is 'Strona główna' - whateveter 1.16 shows it correctly.

What is this strange encoding 'Strona główna'? How it is converted into "Strona główna"?

1.17 for has problem with read this strange encoding

1.16 dumpBackup.php works fine whatever 1.17 importDump not import main page - next problem.

All fields are this type - checked (65 hits in export):

`page_title` varchar(255) character set utf8 collate utf8_bin NOT NULL
Chameleon Red One (talkcontribs)

I found that I am probably using Garbled text - it looks that 1.17 not support conversion for "garbled" or I do not know how to trigger it?

Still I have not idea what to do to migrate? (talkcontribs)

MediaWiki never officially supported charset conversions for garbled text. Garbled chars are caused by a configuration problem and MediaWiki only worked correctly with them by chance. But that it did, was no feature. That drupal page looks like a good help.

Chameleon Red One (talkcontribs)

Strange till 1.16 Mediawiki handles Garbled text. (talkcontribs)

But it never officially supported it. There was no feature "MediaWiki will work with broken setups". If it did, that was not intentional. You just had luck. If you are interested, you can look up the change, which broke it in SVN and see, if you can fix it again. However, I would not spend my time into fixing symptoms while I can also cure the root cause.

Chameleon Red One (talkcontribs)

I do such magic

mysqldump --user=XYZ -p --default-character-set=latin1 --skip-set-charset wikidb > wikidb-latin1.sql

Than (utf8 assumed force with --default-character-set=utf8):

mysql -u root -p  testwikidb --force < wikidb-latin1.sql

It looks that work ... need to test.

Reply to "1.17, 1.18, 1.19, 1.20, 1.21 Critical bug - lack of translation of page name to suitable encoding - whole site is broken by update scripts (not updated)"