Talk:Flow/Analytics
Talk with Dan Andreescu 2014-11-05
editThere's one dashboard server in labs, limn1, easy to add another site to it.
Mobile Web team's report card is more automated that the editor-engagement hacks described on Flow/Analytics. Their repo analytics-limn-mobile-data defines the data generation and the report card web site presentation. We'll clone this for Flow. Some files here:
- config.yaml names the graphs and their SQL file
- If we just point to a CSV it's easy, if we want to tweak we have to point to a datasource.
- edits-monthly-new-active.sql uses Jinja templating so the query is parameterized
- generate.py is run by a cronjob on sta1003 to actually generate stats.
- the SQL queries run against
- databases hosted on
analytics-store.eqiad.wmnet
(replicated DB stuff, not just EeventLogging but e.g. enwiki revision tables. - databases hosted onx1-analytics-slave
- databases hosted on
- We need to make sure that Flow's special DB cluster extension1 with flowdb on it is also accessible to this.
- the SQL queries run against
- operations/puppet has limn config in
modules/limn
andmanifests/misc/limn.pp
In development
editThere's no way to replicate the whole limn set up locally. Annoying, just run the sql on stat1003 and if it works commit it to flow-analytics repo and hope.
working on stat1003
editSet up access to stat1003 (through bast1001.wikimedia.org)
For mysql borrow ~milimetric/my.cnf.one-box
$ ssh stat1003.wikimedia.org
$ mysql --defaults-file=/home/milimetric/.my.cnf.one-box
mysql:research@analytics-store.eqiad.wmnet [(none)]> show databases
This has replication of all the wiki databases like enwiki
(but not flowdb
yet.
Also log
database has all the event logging tables corresponding to SchemaName_revision on metawiki.
- we'll need to add flowdb here, see ToDo.
mysql:research@analytics-store.eqiad.wmnet [(none)]> use log
Database changed
mysql:research@analytics-store.eqiad.wmnet [log]> show tables like 'echo%';
+-------------------------+
| Tables_in_log (echo%) |
+-------------------------+
| EchoInteraction_5539940 |
| EchoInteraction_5782287 |
| EchoMail_5467650 |
| EchoPrefUpdate_5488876 |
| Echo_5285750 |
| Echo_5364744 |
| Echo_5423520 |
| Echo_6081131 |
| Echo_7572295 |
| Echo_7731316 |
+-------------------------+
10 rows in set (0.00 sec)
In production
edit- make changes to our repo
- +2 them
- puppet runs, updates stat1003
- next cron job should pick up the changes
Limn data generation
editLimn data generation also runs on stat1003,
- The repo code is checked out to /a/limn-mobile-data
/a/limn-mobile-data/generate.py
create stuff in/a/limn-public-data/mobile/datafiles
- the Limn log is
/var/log/limn-mobile-data.log
(we aren't in thestats
group so we can't see it)- anything goes wrong, bother Dan.
- whatever's in
/a/limn-public-data/
gets rsync'd to http://datasets.wikimedia.org/limn-public-data
So we see the change a few hours later.
Help
editmilimetric
on #wikimedia-analytics connect, alsomforns
andnuria
To do
edit- Dan create a new repo cloned from mobile analytics, but with a
flow
folder in place of mobile.- So we think we would put our query definitions and stuff in a here.
- Dan set up a separate repository for our Flow reportcard that points to our virtual host and runs Flow's generate.py from a cron job
- Dan set up a puppet change to set up the new flow-reportcard.wmflabs.org (a reportcard can have multiple dashboards). Limn1 has a
- ErikB will give Dan the DB details for Flow: flowdb on extension1
- everyone make RT request for stat1003
- Dan Andreescu will give us access to limn1.
- Mattflaschen
- spage
- Add your labs login here (the think in wikitech instance)