Flow/Analytics

Goals:

determine engagement with Flow boards (a Trello card)
- We'll do this by running queries against the Flow DB
- Probably want to compare with regular talk pages.
measure how people use the UI (Trello card plus others)
- We'll do this using EventLogging to log m:Schema:FlowReplies and action events.
- also involves qualitative User Research.

Determining engagement

Flow can determine metrics like new topics, and average number of replies to topics because these are separate DB updates.

We'll probably want to compare with regular talk pages. Wikimetrics can show edit metrics for regular talk pages, but it's only for a cohort, a defined group of users.

Wikimetrics page edits aren't currently talk-aware. Determining similar metrics (New topics, count replies, etc.) for regular user talk pages is harder. Echo has a DiscussionParser that can help, but it's intensive, parsing each revision.

Cohort

Typical wikimetrics involves identifying a cohort ("People who signed up at our Editathon") and then tracking their page edit success.

Flow doesn't have obvious cohorts to compare, we could just pick a bunch of newly-registered users. Danny has manually counted regular talk page edits vs. Flow board edits.

Implementation

http://flow-reportcard.wmflabs.org/ runs on the front-end web server limn1.eqiad.wmflabs

Dan Andreescu set up analytics/limn-flow-data repository (see its gerrit patches) based off mobile's repo.

This commit deploys the Flow metadata to limn1
reportcard.json defines our default dashboard.
Gerrit change 171465 sets up a cron job to generate Flow statistics

The Flow analytics repository is regularly checked out on the stats back-end machine stat1003 to /a/limn-flow-data. Log output from the generate.py cron job (not much) appears in /var/log/limn-data/limn-flow-data.log

To generate new data the Flow team "only" needs to

commit python query scripts based on mobile to our limn-flow-data repo's flow directory
and update reportcard.json to reference their output.

Deploying new front-end code

Deploying new code on the front-end is a separate process. You need to check out https://github.com/wikimedia/limn-deploy locally. limn-deploy uses [www.fabfile.org/ Fabric] to execute commands remotely on limn1 via ssh, so you need to be able to ssh limn1.eqiad.wmflabs. It has "stages" for deployment, flow is one of the stages, thus

$ cd your/git/analytics
$ git clone https://github.com/wikimedia/limn-deploy
$ cd limn-deploy
$ sudo pip install -e .
$ fab -l  # lists available stages and commands

Then to push changes to the Flow analytics front-end:

$ fab flow deploy.only_data

How to get info to a dashboard?

Limn for now

Mobile and multimedia teams have automated this, each has a labs server ( http://mobile-reportcard.wmflabs.org and http://multimedia-metrics.wmflabs.org ), running cron jobs and generating Limn graphs.

Multimedia team also has server-side graphing in Ganglia.

Dan Andreescu will tell us where the code is, how these teams do it, etc.

Example: Echo dashboard

http://ee-dashboard.wmflabs.org/dashboards/enwiki-features has Echo, AFT, Page curation, and WikiLove stats. (An interesting one is Echo views by category.) All dashboards are actually puppetized web hosts on a limn1 server.

wikitech:EE Dashboard has some info about setting this up
enwiki-features dashboard definition has multiple graph_ids including "enwiki_echo_all"
- enwiki_echo_all datasource definition points to URL
  - http://datasets.wikimedia.org/public-datasets/enwiki/echo/echo_all.csv
    - which is on wikitech:Datasets.wikimedia.org
      - which is stat1001 where I think we can run cron jobs to create datasets, or possibly stat1003.

(Note "ee-dashboard" sounds like a labs machine for the Editor engagement team (what the Flow team used to be called), but is actually editor engagement research (User:DarTar).)

Privacy: not too much data, not too long, not too personal

don't store data for long periods.
don't store personally-identifiable information data.
don't log for every single user

Note that Echo does all this for logged-in users who click on the Echo [NN] red badge.

Next steps

Talk to Dan Andreescu

Make sure we define what success is

For comparison, Analytics has developed a well-defined funnel for "editor success": user registers, user edits successfully, and user sticks around.

Possible model

how many people visit a talk page
- and never try again
- or try to edit
- or add a new topic/section
  - and "get their answer"

UI event logging

We understand this pretty well.

Can Extension:Flow simply require EventLogging, or can it be decoupled through a "track" interface? ( see how VisualEditor decouples)