Talk:Toolserver:User-store
Latest comment: 11 years ago by Nemobis in topic Stats compression
Update
editIt's not clear whether one can create new directories or not here. I thought not, and that you needed to ask permission (and document it here), but I see that the page is outdated. This is the current situation, and some things seem unneeded or misplaced. Nemobis 08:31, 5 February 2012 (UTC)
ls
|
---|
$ ls -l total 53980336 drwxr-sr-x 3 dschwen users 96 Feb 28 2009 800px-thumbs-commons drwxr-xr-x 2 alebot users 96 Sep 20 21:29 Alebot drwxr-sr-x 3 multichill users 96 Mar 7 2009 ars_usda_gov -rw-r--r-- 1 enwikt enwikt 38292936 Sep 23 2010 arwiktionary-20100915-pages-articles.xml drwxr-xr-x 13 aude users 1024 Dec 12 04:48 aude drwxr-xr-x 8 seb35 users 1024 Aug 28 2010 bnf drwxr-xr-x 4 bryan users 96 Oct 28 2010 bryan drwxr-xr-x 6 kaldari users 1024 Jun 3 2011 bulkuploader drwxr-xr-x 8 cbm users 9216 Jul 28 2011 cbm -rw-r--r-- 1 vyznev users 282934189 Apr 15 2010 commonswiki-20100407-page.sql.gz drwxr-xr-x 14 contests contests 1024 Nov 12 07:53 contests -rw-r--r-- 1 hippietrail users 7636 Apr 19 2011 cs20100831-all-idx.raw -rw-r--r-- 1 hippietrail users 7636 Apr 19 2011 cs20100831-all-off.raw -rw-r--r-- 1 hippietrail users 76187 Apr 19 2011 cs20100831-all.txt -rw-r--r-- 1 hippietrail users 22908 Apr 19 2011 cs20100831-off.raw lrwxrwxrwx 1 hippietrail users 59 Apr 19 2011 cswikinews-20100831-pages-articles.xml -> /mnt/user-store/dump/cswikinews-20100831-pages-articles.xml -rw-r--r-- 1 mzmcbride users 405 Nov 10 2010 curltest.bz2 drwx--x--- 5 daniel users 1024 Jan 12 2011 daniel -rw-r--r-- 1 jsonntag users 9559946161 Jul 4 2011 dewiki-20110621-pages-meta-history.xml.7z drwxr-xr-x 7 harddisk users 1024 Oct 31 2010 dewiki_static -rw-r--r-- 1 balu users 1489819 Jan 17 2011 dewiktionary-latest-all-titles-in-ns0 -rw-r--r-- 1 balu users 214945515 Jan 14 2010 dewiktionary-latest-pages-articles.xml drwxrwxr-x 2 dispenser users 3072 Dec 23 16:01 dispenser drwx--S--- 3 drh08 users 96 Aug 18 2009 drh08 drwxrwxrwx 3 sk dumps 40960 Feb 5 08:13 dump drwxrwxrwx 30 mzmcbride dumps 1024 Feb 4 21:14 dumps drwxr-xr-x 2 emijrp users 24576 Feb 3 14:23 emijrp lrwxrwxrwx 1 hippietrail users 34 Feb 25 2011 enlatest-all.txt -> /mnt/user-store/en20110205-all.txt lrwxrwxrwx 1 hippietrail users 18 Oct 4 2009 enprevious-all.txt -> en20091003-all.txt -rw-r--r-- 1 emijrp users 693939457 Oct 28 2010 enwiki-latest-categorylinks.sql.gz -rw-r--r-- 1 emijrp users 244646437 Oct 28 2010 enwiki-latest-imagelinks.sql.gz -rw-r--r-- 1 enwikt enwikt 2085194006 Oct 9 05:02 enwiktionary-20111008-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2090210148 Oct 17 05:04 enwiktionary-20111016-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2100781115 Oct 25 05:08 enwiktionary-20111024-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2105183139 Nov 3 05:02 enwiktionary-20111102-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2123503348 Nov 13 05:02 enwiktionary-20111112-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2118245759 Nov 22 05:02 enwiktionary-20111121-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2139398338 Nov 30 05:02 enwiktionary-20111129-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2153787519 Dec 9 05:02 enwiktionary-20111208-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2164077241 Dec 17 05:02 enwiktionary-20111216-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2173188530 Dec 25 05:02 enwiktionary-20111224-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2185108737 Jan 2 05:02 enwiktionary-20120101-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2196287047 Jan 10 05:02 enwiktionary-20120109-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2207838421 Jan 18 06:53 enwiktionary-20120117-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2214930203 Jan 26 05:02 enwiktionary-20120125-pages-articles.xml -rw-r--r-- 1 enwikt enwikt 2220726357 Feb 4 05:02 enwiktionary-20120203-pages-articles.xml -rw-r--r-- 1 balu users 25300829 Jan 17 2011 enwiktionary-latest-all-titles-in-ns0 drwxr-xr-x 4 enwp10 enwp10 1024 Feb 1 2011 enwp10 drwxr-sr-x 3 multichill users 96 Apr 21 2009 eol-jsc-nasa -rw-r--r-- 1 emijrp users 256896618 Jul 4 2010 eswiki-latest-pagelinks.sql.gz drwxr-xr-x 4 multichill users 96 Aug 15 2010 geograph drwxr-xr-x 3 multichill users 96 Nov 22 21:56 geograph_new drwxr-xr-x 2 giovanni users 1024 Dec 8 23:46 giovanni drwxr-xr-x 2 magnus users 49152 Feb 5 00:57 glam_page_views drwxr-xr-x 5 grphack users 1024 Jan 16 18:23 grphack drwxr-xr-x 2 grphack users 1024 Jan 16 18:23 halfak drwxr-xr-x 4 hoo users 1024 Dec 5 18:17 hoo drwxr-xr-x 3 hydriz users 1024 Feb 5 07:45 hydriz drwxrwxr-x 2 dschwen users 966656 Feb 5 00:13 iip-cache-dschwen -rwxrwxrwx 1 harddisk users 838 Oct 19 18:22 INDEX drwxr-xr-x 3 johang users 1024 Jan 29 23:13 johang drwxr-xr-x 3 kolossos users 96 Dec 4 11:47 kolossos drwxr-xr-x 8 magnus users 14336 Feb 4 20:43 magnus drwxr-xr-x 3 master users 96 Feb 5 04:36 master drwxr-xr-x 2 mazder users 1024 Dec 5 2010 mazder -rw-r--r-- 1 cbm users 9671 Dec 15 2009 md5.c drwxr-xr-x 45 saper users 2048 Dec 27 13:50 mediawiki drwxr-xr-x 3 saper users 96 May 10 2010 MediaWiki-import drwxrwxr-x 3 merl merliwbot 12288 Feb 5 01:19 MerlIwBot drwxr-xr-x 2 messedrocker users 2048 Dec 30 2010 messedrocker drwxr-xr-x 2 mzmcbride users 1024 Jun 26 2011 mzmcbride drwxrwxr-x 3 grphack users 96 Apr 21 2011 newbies drwxr-xr-x 16 h4ck3rm1k3 users 1024 Jan 23 2011 osmgit drwxr-xr-x 86 cmarqu users 4096 Nov 28 2009 osm_hillshading drwxr-xr-x 3 cmarqu users 96 Aug 18 2009 osm_overlays drwxr-xr-x 2 apmon users 96 Dec 1 22:19 osm_planet drwxr-xr-x 9 multichill users 9216 Jun 9 2010 OS_OpenData drwxrwxr-x 2 dschwen users 65536 Jan 29 17:58 pano-cache-dschwen drwxr-xr-x 18 bryan users 2048 Nov 9 2009 phase3 drwxr-xr-x 3 prolineserver users 96 Sep 5 14:30 prolineserver drwxr-xr-x 4 valhallasw users 96 Nov 10 17:51 pywiki -rwxrwxrwx 1 mzmcbride users 269 Aug 9 2010 README drwxr-xr-x 2 whym users 1024 Oct 11 09:19 revision_diffs_20110405 -rw-r--r-- 1 schutz users 89 Aug 27 2006 robots.txt drwxr-xr-x 2 russell users 96 Sep 20 13:52 russell drwxr-xr-x 3 sk users 96 Feb 4 09:52 sk drwxr-sr-x 41 schutz users 1007616 Feb 5 08:05 stats drwxr-xr-x 2 harddisk users 96 Sep 26 2010 tar_test drwxr-xr-x 5 tim1357 users 1024 Dec 9 23:49 Tim1357 -rw-r--r-- 1 overlordq users 3345234 Mar 31 2009 torServerDirectory -rw-r--r-- 1 root root 116 Dec 19 20:39 UcliEvt.log drwxr-xr-x 3 multichill users 96 Nov 8 2009 wdl drwxr-xr-x 2 whym users 96 Jul 13 2011 whym drwxr-xr-x 3 erfgoed erfgoed 1024 Oct 30 22:21 Wiki Loves Monuments drwxr-xr-x 4 dschwen users 96 Jan 23 23:09 wikiminiatlas drwxr-xr-x 3 darkdadaah users 96 Oct 28 15:10 Wiktionnaire |
The biggest public dirs currently are:
du
|
---|
3.2G aude 39G contests 2.2G dispenser 1.3G dpl 2.0T dumps 5.0G emijrp 7.8G enwp10 4.9G git 8.9G h4ck3rm1k3 2.0G hippietrail 1.4T iip-cache-dschwen 14G jkroll 50G johang 23G kolossos 3.3G krinkle 13G liangent 7.8G magnus 2.3G marco74 1.7G mauro742 6.7G mediawiki 1.5G osmgit 226G osm_hillshading 49G OS_OpenData 6.5G pano-cache-dschwen 17G phe 3.6G prolineserver 43G render 70G sk 197G stats 4.7G svick 208K Tim1357 2.1G whym 21G Wiki Loves Monuments 261G wikiminiatlas 2.9G Wiktionnaire 5.1G wiwosm 4.5T total |
Stats compression
editA huge portion of the space is taken by visitors stats, although now they have two mirrors (WMF and IA). The oldest ones are compressed in LZMA (xz). Compressing gz or xz is useless, can only increase size. I made some tests of compression of a whole month uncompressed, 2011-03-pagecounts
(184G):
7z a -t7z -m0=BZip2 -mmt=6 -mx9
takes ~27h (6 cores, less than 100M memory) and gives 41G7z a -t7z -m0=LZMA -mmt=on -mx9 -md=64m -mfb=64
takes ~56h (2 cores, 800M memory) and gives 37G7z a -t7z -m0=LZMA -mmt=on -mx9 -md=256m -mfb=64 -ms=on
takes about 3 days (2 cores, 2700M memory) and gives 35G- tar with xz uses LZMA with standard settings and can only give worse results (I tried it but it got killed by mistake, wasn't going anywhere though)
- individual gz are 51.2G
- individual xz of this month are not yet available for comparison
--Nemo 10:23, 22 March 2012 (UTC)
- Maybe it would be worth trying xz with higher compression, but I can't locate the stats files on user-store right now. --Nemo 14:02, 20 October 2013 (UTC)