Convert Socialtextwiki to Mediawiki
This page describes how to convert a Socialtextwiki to Mediawiki using Linux. It is based on a single conversion and is by no means exhaustive; it was tested with a wiki comprising only a few hundred pages and files and could be improved a lot.
Socialtextwiki is similar to Kwiki.
The procedure described below can:
- convert pages, retaining the essential syntax
- convert files
- convert histories of pages and files
and cannot:
- convert tables (reedit them manually, mostly adding only start and end syntax)
- most other features of Socialtextwiki
- convert user-association of edits
- and much more
Introduction
editOur socialtext wiki is stored as files located in a directory named
data
The tree contains one directory per page (below data/{WORKSPACE}), with one index.txt containing the current version of the page and several {date}.txt files containing older revisions. (Workspaces are separate branches of socialtext wiki.)
The files are located within a directory named
plugin
Put all the following files and dirs (except for the new wiki) into one working directory and proceed as follows.
Install MediaWiki
edit- install a current mediawiki
- allow upload of all files
$wgEnableUploads = true; $wgStrictFileExtensions = false; $wgCheckFileExtensions = false;
- modify php.ini and reload apache2 (to be able to upload bigger files)
post_max_size = 32M upload_max_filesize = 32M
Copy the original files to the new host
editCopy these directories (use scp, not rsync, since we don't want symlinks; index.php are symlinks):
- data
- plugin
Script to convert a single page
editcreate a script conv.py to convert a single page. It takes the file name of the page as arg1.
#!/usr/bin/python import re import sys filename = sys.argv[1] f = open(filename, "r") text = f.read() (header,content) = text.split('\n\n',1) # trim content lines lines = content.split('\n') lines2 = [line.strip() for line in lines] content = '\n'.join(lines2) # headings p = re.compile('^\^\^\^\^(.*)$', re.M) content = p.sub('====\\1 ====', content) p = re.compile('^\^\^\^(.*)$', re.M) content = p.sub('===\\1 ===', content) p = re.compile('^\^\^(.*)$', re.M) content = p.sub('==\\1 ==', content) p = re.compile('^\^(.*)$', re.M) content = p.sub('=\\1 =', content) # bold p = re.compile('([^\*]+)\*([^\*]+)\*', re.M) content = p.sub('\\1\'\'\'\\2\'\'\'', content) # link p = re.compile('\[([^\]]+)\]', re.M) content = p.sub('[[\\1]]', content) # file p = re.compile('{file: ([^}]+)}', re.M) content = p.sub('[[Media:\\1]]', content) # image p = re.compile('{image: ([^}]+)}', re.M) content = p.sub('[[Bild:\\1]]', content) # item level 1 p = re.compile('\342\200\242\011', re.M) content = p.sub('* ', content) # table, only partially, do the rest manually! # you have to add {|... , |} , and check for errors due to empty cells p = re.compile('[^\n]\|', re.M) content = p.sub('\n|', content) p = re.compile('\|\s*\|', re.M) content = p.sub('|-\n|', content) # lines with many / * + symbols were used as separator lines... p = re.compile('[\/]{15,200}', re.M) content = p.sub('----', content) p = re.compile('[\*]{15,200}', re.M) content = p.sub('----', content) p = re.compile('[\+]{15,200}', re.M) content = p.sub('----', content) # external links p = re.compile('\"([^\"]+)\"<http(.*)>\s*\n', re.M) content = p.sub('[http\\2 \\1]\n\n', content) p = re.compile('\"([^\"]+)\"<http(.*)>', re.M) content = p.sub('[http\\2 \\1]', content) # add categories content += '\n' header_lines = header.split('\n') for line in header_lines: if re.match('^[Cc]ategory: ', line): category = re.sub('^[Cc]ategory: (.*)$', '\\1', line) content += '[[Category:' + category + ']]\n' # departments / workspaces if re.match('data/zsi-fe', filename): content += '[[Category:FE]]\n' if re.match('data/zsi-ac', filename): content += '[[Category:AC]]\n' if re.match('data/zsi-tw', filename): content += '[[Category:TW]]\n' print content
Test it like this:
./conv.py data/{WORKSPACE}/{PAGENAME}/{REVISION}
Just copy the resulting wiki text into a page of the new mediawiki and use preview.
Adapt the python script to your needs until most pages are translated correctly.
Script to upload a single file
editThe MediaWiki API does not yet have action=upload. Get upload.pl.
The file has to be modified to use our new server instead of mediawiki.blender.org . Also edit username and password. Create a directory called 'upload', put content there and test uploading.
script to migrate pages
editUse this script (which calls ./conv.py) to migrate pages. They will be uploaded in chronological order:
#!/bin/sh wikiurl="http://NAME.OF.NEW.SERVER/mediawiki/api.php" lgname="WikiSysop" lgpassword="*************" # login login=$(wget -q -O - --no-check-certificate --save-cookies=/tmp/converter-cookies.txt \ --post-data "action=login&lgname=$lgname&lgpassword=$lgpassword&format=json" \ $wikiurl) #echo $login # get edittoken edittoken=$(wget -q -O - --no-check-certificate --save-cookies=/tmp/converter-cookies.txt \ --post-data "action=query&prop=info|revisions&intoken=edit&titles=Main%20Page&format=json" \ $wikiurl) #echo $edittoken token=$(echo $edittoken | sed -e 's/.*edittoken.:.\([^\"]*\)...\".*/\1/') token="$token""%2B%5C" #echo $token # test editing with a test page #cmd="action=edit&title=test1&summary=autoconverted&format=json&text=test1&token=$token&recreate=1¬minor=1&bot=1" #editpage=$(wget -q -O - --no-check-certificate --load-cookies=/tmp/converter-cookies.txt --post-data $cmd $wikiurl) #echo $editpage #exit # loop over all pages except for dirs in the list of excludes find data -not -path "data/help*" -type f -and -not -name ".*" | sort | while read n; do pagedir=$(echo $n | sed -e 's/.*\/\(.*\)\/index.txt/\1/') if [[ "`grep -q $pagedir excludes; echo $?`" == "0" ]]; then echo "omitting $pagedir" else echo "parsing $pagedir" workspace=$(echo $n | sed -e 's/.*\/\(.*\)\/[^\/]\+\/index.txt/\1/') pagename=$(egrep '^Subject:' $n | head -n 1 | sed -e 's/^Subject: \(.*\)/\1/') pagedate=$(egrep '^Date:' $n | head -n 1 | sed -e 's/^Date: \(.*\)/\1/') echo "$workspace $pagedir -------------- $pagename"; text=$(./conv.py $n) text1=$(php -r 'print urlencode($argv[1]);' "$text") pagename1=$(php -r 'print urlencode($argv[1]);' "$pagename") pagedate1=$(php -r 'print urlencode($argv[1]);' "$pagedate") cmd="action=edit&title=$pagename1&summary=$pagedate1+autoconverted+from+socialtextwiki&format=json&text=$text1&token=$token&recreate=1¬minor=1&bot=1" editpage=$(wget -q -O - --no-check-certificate --load-cookies=/tmp/converter-cookies.txt --post-data $cmd $wikiurl) #echo $editpage fi done
script to migrate files
editUse this script (which calls ./upload.pl) to migrate files. The files will be uploaded in chronological order:
#!/bin/sh find plugin -path 'plugin/zsi*/attachments/*.txt' | sort | while read f; do if [[ "`grep -q 'Control: Deleted' $f; echo $?`" != "0" ]]; then d=${f/.txt} filenameNew=$(egrep '^Subject:' $f | sed -e 's/Subject: \(.*\)/\1/') filenameOrig=$(ls -1 $d | head -n 1) version=$(egrep '^Date: ' $f | sed -e 's/Date: \(.*\)/\1/') #echo "---------------------------" #echo $filenameOrig #echo "$filenameNew" rm upload/* cp $d/$filenameOrig "upload/$filenameNew" # prepare upload echo -e ">$filenameNew\n$filenameNew\n$version\n(autoconverted from socialtext wiki)" > upload/files.txt # upload ./upload.pl upload fi done
Notes
edit- socialtext wiki REST-API (unused)