User:OrenBochman/Installation

I have been trying to do some development on media wiki search. It turns out that setting up an environment is not so simple. I decided to document the process in case I need to do it again and for other peoples benefit.

MediaWiki Development Environment Setup On Labs edit

Docs Suggestions edit

  1. To start of there is a bunch of technology being used in labss that is a little intimidating.
    • So the tutorial (geared at developers) should start by letting them know what is what for example:
    • what is PUPPET
    • what is BASTION
    • what is an INSTANCE (an instance of)
      • next I tried to access both existing and non existing instances - but it was not successful :-(...
  2. Instance Creation - what are the option during an instance creation ? (even my helpers seemed confused what cache to use for my use case)
    • block diagrams would reduce my panic level almost as much as a panic button :-)
    • for me a page with
      • a block diagram (apache,php,mediawiki,cache,search extention,) on one machine + a script of how to realise it would be great.
      • also a diagram of the real search (sub) cluster's setup and how to set it up as an instance would be interesting in a week or two.

some issues which should be in the labs documentation are things like:


  • where to and how to get extentions?
  • where to find a dump, how to get it into the instance, how to import it, how to track the import?
    • wget it to where
    • run what command
    • how to check it's progress (the wiki's stats page v.s. a console)
  • how to back up an instance.
  1. how to set up java, ant, maven ....


  • another thing is that even though I used to work in security startups for 4 years - using SSH tunneling is now vauge

and geting into the instance is realy difficult to get right.

  • what is a security group - is it like port forwarding on my router ??? It could use an introduction like
    1. "if you don't setup a security group all the ssh tunells you set up won't work since (... the port will be blocked - or the real reason)".
    2. to set up the security group go to ... "Manage Security group list" and add rules like ...
    3. also the Manage Security group list itself is bare and could give/reffer to some sample setutps.
  • I found the Instance console realy helpfull and even after a couple of tips to test from within the instance I got nowhere. so diagnostic tips which are obvious to an op are great for noobs. e.g.
 use the Instance>Console Output if you cannot access port XXX. if it says ... refused, you need to set up a security group ...
  • I'm also worried to no end that setting up and working with an instance of servral machines like the real search cluster would be (which is mount improbable for me at this time) would be mission imppossible when adding in virtualization.

MediaWiki Development Environment Setup On Ubuntu edit

commands edit

  • ps -ef
  • tail
  • top
  • df - show info on file system
  • nohup command & - ignore terminal disconect

install utilities edit

sudo apt-get install 7zip mc

sudo aptitude mysql -p=puppet mysql -ppuppet nohup mysql -ppuppet data < simplewiki-latest-page_props.sql & tail nohup.out

Misc edit

  • mysql -p data < simplewiki-latest-protected_titles.sql
  • mysql -p data < simplewiki-latest-redirect.sql
  • mysql -p data < simplewiki-latest-page.sql
  • rm *sql

edit

  • cd /var/www/w
  • nohup php maintenance/rebuildall.php &
  • tail nohup.out


w df ls cd /tmp ls rm simplewiki-latest-pages-meta-current.xml df rm simplewiki-latest-pages-articles.xml df 7z x simplewiki-latest-pages-meta-history.xml.7z ls rm *xml df 7z ls cp simplewiki-latest-pages-meta-history.xml.7z /mnt cd /mnt df ls mkdir petrb mkdir extract mv simplewiki-latest-pages-meta-history.xml.7z extract/ nohup php w/maintenance/update.php & df top df ls df top df cd w vi LocalSettings.php ls extensions/ vi LocalSettings.php cd .. nohup php w/maintenance/update.php & php w/maintenance/update.php vi w/LocalSettings.php php w/maintenance/update.php vi w/LocalSettings.php php w/maintenance/update.php df vi w/LocalSettings.php cd /tmp wget download:simplewiki/latest/simplewiki-latest-category.sql.gz wget download:simplewiki/latest/simplewiki-latest-page_props.sql.gz ls wget download:simplewiki/latest/simplewiki-latest-interwiki.sql.gz gzip -d * wget download:simplewiki/latest/simplewiki-latest-iwlinks.sql.gz ls mysql --password=puppet data < simplewiki-latest-interwiki.sql vi /var/www/w/LocalSettings.php top df whereis tomcat6 exit ls aptitude df ls /home ls df

MediaWiki Development Environment Setup On Windows edit

XAMPP Application Stack Installation edit

Installed latest XAMPP [1]. That means:

XAMPP 1.7.4
Apache 2.2.17
MySQL 5.5.8
PHP 5.3.5
phpMyAdmin 3.3.9
FileZilla FTP Server 0.9.37
Tomcat 7.0.3 (with mod_proxy_ajp as connector)

Speeding Things Up edit

MediaWiki is one of the oldest and slowest WebApplicationFrameworks / Content Managment System. Users are rarely aware of this issue due to extensive use of hardware and a sophisticated cacheing strategy.

The good news is that it is possible to speed things up.

PHP accelerator - eaccelerator (skip to APC) edit

To enable eaccelerator edit php\php.ini and uncomment
";zend_extension = "\xampp\php\ext\php_eaccelerator.dll"

However the binary is not available in the XAMPP distribution and needs to be downloaded separately. This is no easy task. You need to check what type of php installation you have then.

  • what is the PHP version?
  • Is your version built as ThreadSafe?
  • which version of VisualStuio it was built with?

The answers is available from the PHP info page in the XAMPP main page.

PHP accelerator - APC edit

However MediaWiki is best configured with APC and not eaccelerator. as before XAMPP does not bundle the php_apc.dll I searched the forums and came up with http://downloads.php.net/pierre/ of the various distrubution I was able to use php_apc-20110109-5.3-vc9-x86.zip .

  • To enable APC edit php\php.ini and add
   "zend_extension = "\xampp\php\ext\php_apc.dll"
  • Next update MediaWiki LocalSettings.php to use APC by adding
   $wgMainCacheType = CACHE_ACCEL;

Problem: Apache won't start edit

  1. Skype blocked port 80 (resolved)
  2. had to change User Account Control (UAC) via the control panel security settings which blocked control panel from starting Apache (resolved).

Media Wiki installation edit

Production edit

  • For a production MediaWiki the fastest way to install MW is to:
    • decompressed MediaWiki software archive version 1.17 to D:\xampp\htdocs\mediawiki

Development edit

  • For a development MediaWiki instllation is is neccessary to (periodicaly) get the latest version of MW and Extentions from Subversion. Since my project is a java based extention I used the following setup.
    1. set up an Eclipse workspace
    2. Add One PHP project for MediaWikiTrunk (from subversion)
    3. Add One PHP project for MediaWikiExtentions (from subversion)
    4. Check out using svn+ssh a Java project for dev.
    5. Check out using svn+ssh a Java project for making paches.
    6. add to APACHE's httpd.conf the location and the Alias (url mapping) to the MediaWiki.
  <Directory "d:/ws/MediaWikiTrunk">
  Order allow,deny
  Allow from all
  </Directory>
  Alias /mwt "d:/ws/MediaWikiTrunk"
or
  • open D:\xampp\apache\conf\extra\httpd-vhosts.conf
  • Un-comment line 19 (NameVirtualHost *:80).
<VirtualHost *:80>
   DocumentRoot d:/ws/MediaWiki/core
   ServerName transitcalculator.localhost
   <Directory d:/ws/MediaWiki/core>
       Order allow,deny
       Allow from all
   </Directory>
</VirtualHost>
  • Open your hosts file (C:\Windows\System32\drivers\etc\hosts).
  • Add 127.0.0.1 MW #MediaWiki to the end of the file
  1. browsed to http://localhost/mediawiki/ and followed instructions
  2. however it is necessary to use binary representation and not UTF=8 in db otherwise mwdumper will fail

Main & Status Pages edit

Changing Capitalization Settings edit

  1. editing LocalSettings.php and adding at the end and then
    $wgCapitalLinks=false;
  2. running cleanUpCaps.php from the command line (this took about 12 hours for 3 million entries)

Extensions edit

  1. Since My main purpose is Development related (bots and indexing) I wanted to introduce sufficent extetions to allow decent dumping
    1. consult ...
    2. download
    3. decompress to
    4. edit LocalSettings.php adding require( "extensions/ParserFunctions/ParserFunctions.php"); to the end
require( "extensions/ParserFunctions/ParserFunctions.php");
$wgUseAjax = true;
require_once("{$IP}/extensions/CategoryTree/CategoryTree.php");
#char insert
require_once("$IP/extensions/CharInsert/CharInsert.php");
#image map
require_once("$IP/extensions/ImageMap/ImageMap.php");
#require_once Labeled Section Transclusion
require_once ("$IP/extensions/LabeledSectionTransclusion/lsth.php");
#syntax highlighting
require_once ("$IP/extensions/SyntaxHighlight_GeSHi/SyntaxHighlight_GeSHi.php");
#oren code end

DIF edit

get the diff utility [2] and edit LocalSettings.php adding

# Path to the GNU diff3 utility. Used for conflict resolution.
$wgDiff = 'C:/Server/xampp/htdocs/MW/bin/GnuWin32/bin/diff.exe';
$wgDiff3 = 'C:/Server/xampp/htdocs/MW/bin/GnuWin32/bin/diff3.exe';

the alternative is to do nothing. Dif would be available but slower.

Extensions and Dumping edit

It turns out that dumping a MediaWiki via the dumphtml command-line extension is not compatible with some of the other extensions. It is prudent to turn such extensions off for making static dumps. The worst culprit is the syntax highlighting extension, which is not important in the main Wiktionary pages, but useful when developing scripts in user namespace.

Importing Contents edit

Small Wiki - Simple English edit

importing via:

get the dump file and note the article count

php.exe maintenence\importDump.php simplewiki-20120104-pages-meta-current.xml.bz2
  • then update recent pages using
php.exe maintenence\rebuildrecentchanges.php
  • to check the status of the import
php.exe maintenence\shoStats.php
  • if importDump.php is interrupted - a second run will run through imported entries quickly.

Large Wiki edit

  1. importing via maintenence/import.php will take forever and has problems with the term Wiktionary in the inter-wiki namespace table
  2. importing via mwdumper with some db modification took about 12 hours.
  3. unfortunately it crashed every time requiring one to drop the db and reinstall.
  4. to speed things up I removed the indexes from the Text Page and Revisions table and later re introduced them

phpmyadmin can be used to import sql dumps - but it cannot import realty big ones for this another application is required. [[3]]

Fixing Capitalization edit

add to LocalSettings.php:

$wgCapitalLinks=false;

next run maintenece/cleanupCaps.php (12 hours)

DB time outs edit

Once imported there appeared to be a crash. This was due to database time outs. Some were caused by simultaneous db imports of required SQL tables. However even when these were done the time outs persisted.

The solution involved five days of diagnostic ad various attempts at patching things up. The actual solution came from:

  • Turning on traces and diagnostics.
  • Trying to dump pages via the dumpHtml extension.
  • Enabling Zend-Eaccelerator seems to have reduced the problem. Once done time outs no longer occurred on most pages.

resolution: installed zend e-accelerator from [4] and time outs have stopped.

  • Increasing the database time out from 30 seconds to 60.

Importing SQL Dumps edit

Running the rebuild script should restore the DB to fully functional status. However this script runs three tasks each taking an order of magnitude longer than its predecessor. The recreation of links seems to be impractical in a large project. Also there is no indication of progress.

I wanted to have a fully functioning version of Wiktionary at this point (perhaps without the pictures) to allow a static dump which could be used to make an offline version.

Diagnostics edit

  1. run rebuild:
  2. Then the wiki decided to crash. It gives the error: "Fatal error: Maximum execution time of 30 seconds exceeded in D:\xampp\htdocs\mediawiki\includes\db\DatabaseMysql.php on line 23"
  3. It turns out that the problem is not a crash but a slow response on many pages
    1. random link works
    2. pages generated by random link work too
  4. Switching to a second empty db with different table suffix works fine too


http://devzone.zend.com/1147/debugging-php-applications-with-xdebug/

More Problems edit

Pages were often littered with malformed extraneous tags (which if properly written would never be shown by a browser Resolution: install Tidy.

Resources edit

MySQL & Optimizing edit