Руководство:Восстановление кода вики из кэшированного HTML

This page is a translated version of the page Manual:Restoring wiki code from cached HTML and the translation is 38% complete.

If you've managed to fail in your attempts at backing up your wiki , like we did, you may, unfortunately, after a server failure, be left with no other option than trying to recreate your lost content from various cached copies of pages from your site.

Где получить кэшированный HTML для вашего сайта

  • Первое место, где следует искать кэшированные HTML-страницы из потерянной вики, - это кэш страниц вашего браузера. Access about:cache on either Google Chrome/Chromium or Firefox, and you will be able to view these cached pages... but make sure you are in 'Work Offline' mode first so that you don't kill your cache with new pages from your server.
  • Поисковые системы хранят кэш страниц, по крайней мере, самых популярных сайтов: попробуйте Google, Bing и Yahoo.
  • Веб-архив, www.archive.org также может иметь некоторые из ваших страниц, если вам повезет.
  • Вы можете обнаружить, что существуют и другие кэши, если вы находитесь в крупной компании или университете, которые поддерживают кэширующий прокси-сервер.

On Google, searching for site:mywiki.example.com will get you a list of most of the cached pages for your site, but sometimes you can access more pages by searching for specific page titles. This is a slow manual process of saving as much cached content as you can... as quickly as possible after the disaster occurs (once you restore your wiki, the cache will start refreshing from your new server and your further content may be lost)

Using HTML to reconstruct your wiki

If you've managed to retrieve most of your wiki content, it is then possible to process that content using a bunch of scripts. Some code from the year 2010 that is useful for this purpose is available at: http://code.ascend4.org/ascend/trunk/tools/mediawiki/html2mediawiki/

The above code does the basic job of reconstructing headings, lists, tables, links, math, and source code listings. It also correctly handles category tags, and some specific templates. The core parts of this code use BeautifulSoup and Python's regular expressions module to search for recognized patterns.

Every MediaWiki instance is different though: different installed extensions and different templates will mean that the above scripts will probably have to be carefully edited before you use them to process your particular site. There are probably some hard-wired references to the ASCEND wiki in the above code that you will need to carefully read over and change.

Other HTML2wiki scripts have been published but these have a slightly different aim: to translate HTML snippets for inclusion in a wiki, rather than reconstructing a wiki from its HTML impression.