User:Erik Zachte/progress

New updates will be logged on my new staff account: User:Erik_Zachte_(WMF)/Progress

week 34

(16 hrs)

(41 3/4 hrs)

  • new report for Medicin project
week 32

(26 3/4 hrs)

  • Wikimania
week 31

(17 1/4 hrs)

  • analysis of migration between projects and languages
  • prep presentation Wikimania
  • report card
week 30

(13 3/4 hrs)

  • analysis of migration between projects and languages
  • analysis of retention stats
week 29

(22 1/4 hrs)

  • analysis of migration between projects and languages
  • analysis of retention stats
week 28

(15 3/4 hrs)

  • publish monthly squid based traffic reports
  • prepped more data on high near-in-time edit activity by more than one author, for Erik Moeller
week 27

(17.5 hrs)

  • fixed bug 67314 Explain why some tables show monthly data other quarterly
  • fixed bug 67315 Always show all months for wikistats tables with data for all wikis/projects
  • fixed bug: 67472 Page content metrics missing for hewiki
week 26

(19 hrs)

  • closed bug 66004 Monthly aggregation of page views fails
  • prepped data on high near-in-time edit activity by more than one author, for Erik Moeller
  • new sections in Wikistats portal
  • expanded and actualized documentation on squid reports
week 25

(20 hrs)

week 24

(20.5 hrs)

  • File with edits per user/project/wiki/namespace now also contains userid
(but there may be follow-up needed: userid differs per wiki, dumps may contain userid 0 for oldest months)
week 23

(24.5 hrs)

week 22

(15.5 hrs)

  • worked on editor migration patterns
week 21

(23 3/4 hrs)

  • worked on editor migration patterns
week 20

(15 hrs)

  • fixed bug 65345 lookup list in comments for Windows and OS/X releases outdated
  • working on Wiki Loves Monuments retention and geo-distribution stats
  • working on page views definitions
week 19

(33 3/4 hrs)

  • published monthly squid reports
  • fixed bug 64523 mail address update in squid reports
  • fixed bug 65152 wikidata dump parsing was not tracked in status report
  • analytics offsite
week 18

(14.5 hrs)

  • finishing touches to 'collect edits per editor per wiki per month per namespace for 800+ wikis'
  • published monthly report card (with some bug fixing)
  • extensive discussions around how to represent time periods in a database
week 17

(24 1/4 hrs)

  • collect edits per editor per wiki per month per namespace for 800+ wikis
  • prototype detection of editor migration patterns
  • with extensive help from Andrew Otto I got connection problems fixed which were result of server switch
week 16

(20 3/4 hrs)

  • finalized portal search and blogged about it
  • analyzed how many usernames with 'bot' in name are still on purpose not regarded as bots by Wikistats
  • with great help by Christian got git working, reordered structure, committed many changes ((more to do)
week 15

(7 3/4 hrs)

  • fixed bug 63879: Incomplete monthly aggregated page view files
  • fixed bug 62230: Total edits on wikidata seems too low
week 14

(20 1/4)

  • generated data files for monthly Report Card
  • generated new squid based reports (some url's in the portal needs to be updated to quarterly versions)
  • last review of UN report for David Souter unearthed serious data errors, to be discussed
week 13

(19 hrs)

  • further consultation to David Souter for report commissioned by UN
  • finalized: bug 60826 Enable parallel processing of stub dump and full archive dump for same wiki.
    • file StatisticsMonthly.csv is copied hourly from stat1 to stat1001 under new name
    • WikiReportsOutputTables.pm now reads both StatisticsMonthly.csv and StatisticsMonthlyFullArchive.csv and uses different columns from each file
  • maintenance of wikistats portal to enhance upcoming search, also added several sections
  • submission of presentation proposal for Wikimania
week 12

(19 1/4 hrs)

  • prioritized 46 bugs
  • in progress: bug 60826 Enable parallel processing of stub dump and full archive dump for same wiki.
    • new argument -F to force processing full archive dumps (regardless of dump size)
    • Wikistats now can handle segmented dumps (which BTW differ in file name for wp:de and wp:en) see first 100 or so lines in [2] about meta-history dump files
    • Wikistats can detect error messages in index.html (where a msg about phase completion contains 'failed')
  • in progress: bringing all scripts up to date for changed config (wikistats portal from dataset2 -> dataset1001)
  • kick-off follow-up study deduplicated editors
week 11

(23 3/4 hrs )

  • finalizing search facility
  • added comments (some boilerplate) and impact assessment and proposed priorities for 46 wikistats bugs no in the list are three marked as resolved
  • published squid log reports for February
  • fixed bug 61420: Missing stats for zh.wikivoyage
  • processed mail backlog
week 10

(18 1/4 hrs)

  • generated deduplicated editor counts for several sets of wikis
  • built search facility for wikistats portal
week 9

(11 3/4 hrs)

week 8

(7.5 hrs)

  • consulted David Souter on how to use wikistats for report commissioned by UN (on developments in Internet content and language issues since the World Summit on the Information Society (WSIS) in 2003)
  • worked with Magnus Manske to assess usability of monthly aggregated page view files for his scripts
week 7

(1 3/4 hrs)

week 6

(14 hrs)

  • produced trend charts for google traffic by country
  • produced reports for smallest Wikipedias on request (normally not generated when articles and monthly edits are below threshold of 10)
  • final analysis of page view trends
  • provided data for NYT on page view trends
week 5

(18 3/4 hrs)

  • produced trend charts for crawler patterns
  • ongoing analysis of page view trends
week 4

(18 3/4 hrs)

  • produced long term browser trend charts (mobile/non mobile as well as absolute/relative) from squid log based csv files
  • reran squid log reports with bogus traffic filtered out (Jul-Dec 2013)
  • looking into doing the same for crawler patterns
  • generated input for monthly report card (incl minor bug fixing)
week 3

(23 3/4 hrs)

  • continued to analyze low page views counts, also from squid logs
    • produced breakdowns of article traffic by directly analyzing squids log with grep
    • see last pages of [3]
week 2

(24.5 hrs)

  • prepared files for Limn
    • incl. fix to circumvent for Limn bug, where Limn does not know how to handle empty values for WikiData
    • incl. fix to accept new standardized file names for comScore csv files
  • fixed missing wikis from dump reports (complaint by language team)
    • there was a design flaw, since API querying was added in July 2013, a circular dependency that prevented new codes added to dblist files to be incorporated, after fix two new wikis finally got coverage:
      • Vietnamese Wikivoyage, e.g. [4]
      • Minangkabau Wikipedia, e.g. [5]
  • updated monthly merged page view files + prepped top views reports, e.g. wp:en
  • instructed scripts to ignore input for Jan 5/6 2014 (totals will be extrapolated from remainder)
    • done WikiCountsSummarizeProjectCounts.pl, collects counts for page view reports, reran reports
    • done SquidCollectBrowserStatsExcel.pl
    • other scripts (daily/monthly merge of dammit.lt files) are automatically doing that with hourly precision
    • any other scripts to do? hmm, pondering
  • fixed page view counts shown in Summary reports for Sara Lasner e.g. Greek Wikipedia, now shows pv count for same month as other data in the report
  • added trend line for mobile page views and combined mobile+non-mobile to Summary reports, e.g. Japanese Wikipedia
  • fixed publication of patched projectcount files
  • started to analyze low page views counts, also from squid logs
week 1

(3 1/ hrs)

  • mostly vacation
  • published squid based page view/edit reports
  • published monthly wikistats dump based reports
week 52

(5 hrs)

  • mostly vacation
  • transforming yearly page view/edit reports with yearly averages into monthly reports, last month only
week 51

(12 1/4 hrs)

  • solved bug [6]: Italian Wikivoyage page count in Wikistats seems too low

-> caused by wikistats skipping pages where checksum is missing in dumps -> rerunning all dumps

week 50

(13 1/2 hrs)

  • finalized patch (see week 49)
  • published squid reports
  • contributed to new metric definitions
week 49

(24 1/4 hrs)

  • built script to patch project files from pagecount files (per wiki, since June 1 2013) to substract counts for bogus page views
  • quick charts on total (very) active editors and how those metrics drop on Wikipedia faster than on other projects
  • patched project files
  • assessment of download size for full wikipedia for journalist
  • in depth analysis of impact of patch
week 48

(18 1/4 hrs)

  • investigated with Christian the issue of inflated page views by webstatscollector bug [7]
  • prep comScore files for RC
  • file name normalization of 100's of inconsistently named historic comScore files
week 47

(12 1/4 hrs)

  • published monthly Wikistats reports
  • prepped data for Limn except comScore data (subscription stalled again)
  • ongoing: discussions on metrics definitions
  • marked bug 46289 as resolved (see wk 46)
  • deactivated squid based report Devices and removed links to it
  • several minor fixes on squid reports (layout, update time)
week 46

(15 hrs)

  • published new geo breakdown reports based on unsampled squids log
  • urgent: updated chart for Sue on UV trends for news sites vs Wikimedia (based on a patchwork of yearly comScore data)
    • also created new charts on top reference sites
  • got mailing list stats back running (stalled since Feb 13),
    • (two open issues : look at gap in summer 13, apply Nemo's patch after gerrit sync issue has been fixed)
week 45

(14.5 hrs)

  • prepared input for Monthly Report Card (minus comScore data, subscription renewal is ongoing)
  • updated input for Monthly Report Card (after comScore subscription renewal)
  • minor: Wikistats Overview diagram is now public (linked from Wikistats portal About page)
  • analyzed drop in mobile page views in recent months on English Wikipedia (and others) vs steep rise in non-mobile page views (it turns out the rise in non-mobile is far too large for any possible underreporting on mobile)
  • ongoing: analyze effects of world wide switch to https on 28 August on squid log stats
  • published squid based reports
week 44

(28 1/4 hrs)

  • publish monthly wikistats reports
  • helped analyze drop in total active editors for Sep 2013 (probably seasonal (=within normal range) after all)
  • ongoing: analyze effects of world wide switch to https on 28 August on squid log stats
  • made WLM data more visible in Commons report
  • fixed bugzilla bug 55558: new Wikivoyage logo on wikistats portal
  • ongoing: publish input for report card
week 43

(11.5 hrs)

  • input for cohort analysis to Daimee
  • adapted squid based edit reports, based explanatory texts and final counts on new argument: sample rate
  • reran squid based edit reports from 1:1 unsampled edit log
  • reran edit(or) counts for Sarikas, with updated title list
week 42

(11 hrs)

  • collect data from German dump for external researcher (Dr Sarikas)
  • new script to build new filtered full archive dump based on discrete list of article titles
  • new script to collect edits/editors count (registered,anon,bot) from full archive dump
week 41

(21 hrs)

  • updated (overdue) monthly squid based reports
  • large cluster of reports on page views/edits per country are back after 6 months (more to do, see week 43)
  • squid based data collection now based on 1:1 instead of 1:1000 log files for page edits
  • fixed https://bugzilla.wikimedia.org/show_bug.cgi?id=55528
week 40

(11 3/4 hrs)

  • worked on squid reports (ongoing)