Exporting all the files of a wiki

Exporting all the files of a wiki can be done in a few different ways:

  1. If you have FTP access to the wiki, then you can move the files by following the procedure at Manual:Moving a wiki.
  2. If you lack such access, as can happen for instance if a wiki is abandoned by its site owner, then you will probably need to use workarounds.
    • This procedure can semi-automate the task of downloading all the files, but you will still have to figure out a way to upload them to your wiki.
    • An alternative is to use Manual:Grabbers.

Step 1 edit

  • Follow the procedure at Help:Export#Get_the_names_of_pages_to_export to use a Python script to get the names of all the files on the wiki. When you go to Special:AllPages, you will be selecting the File namespace.
  • Select the names that the Python script spits out, and copy and paste them into Column A of your favorite spreadsheet application (e.g. Microsoft Excel, LibreOffice, OpenOffice.org Spreadsheet]). You should now have a bunch of cells that say, e.g., File:Example.png
  • In Cell B1, put this formula: ="*[[:"&A1&"]]"
  • Copy that formula and paste it into the rest of Column B. (Make sure you don't try to paste cell B1 onto itself, or you'll get an error like "You are pasting data into cells that already contain data") Each cell in Column B should now say something like *[[:File:Example.png]]
  • Go to the wiki you got the filenames from and create a new page, e.g. User:JoeSchmoe/All files. Copy and paste column B into that page, and save.
  • The page will now load; it may take awhile since you are loading everything. You should see a listing that looks like this:
... (etc.)...

Step 2 edit

  • Now use a perl program to generate a script to give you the urls:
use strict;
use warnings;
use LWP::Simple;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;

my $url="http://libertarianwiki.org/User:Joe Schmoe/All_files_2";
my $agentName="User:Tisane (http://www.mediawiki.org/wiki/User:Tisane) grabbing some
	data using FileNameExtract.pl";
my $browser = LWP::UserAgent->new();
$browser->timeout(500);
my $request = HTTP::Request->new(GET => $url);
my $response = $browser->request($request);
if ($response->is_error()) {printf "%s\n", $response->status_line;}
my $contents = $response->content();
my $delimiter="\n";

my $string='title="File:';
my $endString='"';
my $position=0;
my $endPosition=0;

$position=index($contents,$string,$position)+length($string);
$endPosition=index($contents,$endString,$position);
my $firstFileName=substr($contents,$position,$endPosition-$position);
print '$myFileName[0]="'.$firstFileName.'";'.$delimiter;
$position=$endPosition;
my $fileNumber=0;

while ($position!=-1){
    $fileNumber++;
    $position=index($contents,$string,$position)+length($string);
    if ($position!=-1){
        $endPosition=index($contents,$endString,$position);
        my $fileName=substr($contents,$position,$endPosition-$position);
        if ($fileName ne $firstFileName){
            print '$myFileName['.$fileNumber.']="'.$fileName.'";'.$delimiter;
            $position=$endPosition;
        } else {
            $position=-1;
        }
    }
}

Step 3 edit

  • This should generate a list that you can incorporate into another script:
use strict;
use warnings;

use LWP::UserAgent;
use HTTP::Request;

# Files to export from the Wiki.
my @exportFiles = (
    "01-gold-bar.jpg",
    "100px-Massachusetts state flag.png",
    "100px-New York state flag.png",
    "128px-Padlock-red.svg.png",
    ...and so on...
);

# Configuration variables
my $string      = 'images/';
my $endString   = '"';
my $delimiter   = "\n";
my $reject1     = 'LibertarianWiki.gif);';
my $reject2     = 'icons/fileicon-pdf.png';

# Initialize the browser
my $browser = LWP::UserAgent->new();
$browser->timeout(500);

for my $idx (@exportFiles){
    my $exportFile = $exportFiles[$idx];
    
    my $url = "http://libertarianwiki.org/File:$exportFile";
    my $request = HTTP::Request->new(GET => $url);
    my $response = $browser->request($request);
    if (!$response->is_success) {
        printf STDERR "%s\n", $response->status_line;
    }
    
    my $contents = $response->content();

    my $position    = index($contents, $string, 0) + length($string);
    my $endPosition = index($contents, $endString, $position);
    my $filename    = substr($contents, $position, $endPosition-$position);
    if ($position != -1 && $filename ne $reject1 && $filename ne $reject2){
        print qq{\$exportFiles[$idx] = '$filename';$delimiter};
    }
}

Step 4 edit

This in turn will generate a list that you can load into yet another script, e.g.:

use strict;
use warnings;
use LWP::Simple;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;

my @myFileName=('');
$myFileName[0]="7/78/01-gold-bar.jpg";
$myFileName[1]="5/53/100px-New_York_state_flag.png";
$myFileName[2]="8/81/128px-Padlock-red.svg.png";
...
...
...
$myFileName[349]="a/a6/WilliamGodwin.jpg";
$myFileName[350]="b/b1/Wirtland_Coat_of_Arms.png";
$myFileName[351]="f/f5/Wirtland_crane.png";
my $agentName="User:Tisane (http://www.mediawiki.org/wiki/User:Tisane) grabbing some
	data using DownloadImages.pl";
my $browser = LWP::UserAgent->new();
$browser->timeout(500);
my $string='';
my $endString='"';
my $position=0;
my $endPosition=0;
my $prefix='';
my $reject1='skip me';
my $newArrayIndex=0;
my $delimiter="\n";
my $FILE='myhandle';

for (my $count=0; $count<=351; $count++){
    my $url="http://libertarianwiki.org/wiki/images/".$myFileName[$count];
    #my $request = HTTP::Request->new(GET => $url);
    #my $response = $browser->request($request);
    #if ($response->is_error()) {printf "%s\n", $response->status_line;}
    #my $contents = $response->content();
    my $contents = get($url);

    my $newFileName=substr($myFileName[$count],5,length($myFileName[$count])-5);
    print $url.$delimiter;
    print $newFileName.$delimiter;
    sysopen(FILE, $newFileName,0755);
    print FILE $contents;
    close FILE;
}

Step 5 edit

See also edit