Analytics/Archive/Editor Engagement Vital Signs/Dashboard

Goals: What is the Vital Signs Dashboard about

This document describes our technology choices for the Editor Engagement Vital Signs Dashboard. The dashboard is the tool by which we will surface the new metrics. Our goal when developing the dashboard is two fold:

1. Give users a easy UI to visualize and find the new editor metrics from data created by Analytics/Wikimetrics

2. Explore technologies (visualization and otherwise) towards building a replacement of our current dashboarding tool.

And our third "bonus" goal: Provide a dashboard stack that can be used by other developers at the foundation to 'easily' showcase their data. It should be a (very) friendly stack in which is fast to develop and test. We are calling the technology that powers our dashboard Analytics/Dashiki.

What this project is not

The dashboard is not solving several problems we are aware we have at Wikimedia when it comes to dashboard discovery and dashboard creation. It is just a first step towards solving dashboard and metric discovery for the Vital Signs metrics.

The hardest problem of all when it comes to visualization of data is how to get to to the data itself. The dashboard does not solve this issue. We assume all data we want to display is public and available via http with an appropriate set of cache headers. The tool that harvests data for the dashboard is Analytics/Wikimetrics . The non trivial process of how that data comes to be has documentation of its own, you can find wikimetrics' code on github.

Dashboard Stack

Components

The dashboard looks like the following:

A prototype of the dashboard can be found here: http://pauginer.github.io/prototypes/analytics-dashboard/index.html

The dashboard has two main different components: Browsing and Visualization. Besides choosing a visualization framework we need a basic UI stack to be able to create the browsing components. Not only that we need some kind of storage to configure our UI to - for example- display a set of projects that can be browsed by language. We have detailed below our technology choices. We have also included an addendum of other things we have looked at that did not made our final cut.

Technical Criteria

For any solution we also value the community behind it, its recent updates, its documentation and prompt resolution of issues. We would choose a less strong technical solution with a solid community over a more sophisticated one with little use and few maintainers.

Also, we want to keep things small to be able to visualize metrics in mobile thus size and performance plays an important part when choosing any library or technology.

Here is the list of features a solution should have to provide to be the foundation of a solid application.

Infrastructure

Easy testing
Easy building
Developer productivity: Fast updates on development running from source but easy to publish a distribution
Dynamic loading of javascript.
Management of UI dependencies
Packaging of application and distribution

Framework

Binding of templates and data. Two-way binding if possible.
Routing
Flexibility
Partial Views
Dom manipulation
Event handling
Ajax abstractions
Nice to have: scafolding bootstrap to start the project
Developer productivity: fast to develop once you know the basics

Visualization

Be able to visualize on mobile high end devices (android 4 and up, iphone4 and up)
Nice to have: extensibility of visualizations
Nice to have: server side rendering

Storage

Light solution. We want to avoid to run a database to deploy a dashboard
Testable
Real times updates on configuration reflect on the product

Technologies

Infrastructure

To manage dependencies we have decided to go for requirejs and Bower. Bower is package manager `a là` npm but for frontend dependencies. Works well and although it only does one thing (downloading dependencies and keeping track of them) it really does it well. Having a package manager has a clear value proposition, it is much easier to let the package manager manage versions and requirejs just manage loading than having to keep track of versions indirectly via the requirejs configuration.

Framework

There is no lack of frameworks when it comes to Javascript-heavy applications. One of our goals was to be able to try to have a completely server-less application, given that the dashboard just fetches data via HTTP and serves visualizations over HTTP. This server-less architecture would also enable the dashboarding code to run in any environment, including mediawiki, without heavy porting of server code to mediawiki extensions.

Web components were another important concept we consider. They would force a team to build modular, reusable pieces. Since plain web components are not really supported except by the most alpha of browser versions, we looked at Knockout components, Polymer, and a few other similar approaches. Knockout components are built on technology we're familiar with and seem to fit our requirements well. They're also not opinionated and therefore don't have a steep learning curve. Also the yeoman setup of knockout components works pretty well, build and all, giving you a head start. For more info on this regard take a look at this video. Well worth an hour: An architecture for a large scale application that works

For basic DOM management, events and ajax we use jquery 2.0.

Visualization

In the visualization world there are awesome charting libraries like hicharts, dygraphs, Rickshaw, etc. There are lots of those and they all have a ton of features and benefits. Then there is Vega, which takes a different approach. It doesn't explicitly have any chart "types", it just defines a grammar for making visual marks out of data. In the short term, one of the charting libraries is easier to use and integrate into a dashboarding solution, so that satisfies our immediate need better. However, when we want to add things like annotations, some charting libraries lack that feature. In Vega, annotations are just textual marks bound to data. It's this flexibility that means Vega will grow with this project and not restrict future use cases.

The main concern was whether Vega would be a lot harder to use than a basic charting library. We did a proof of concept and feel pretty happy with the results. In about 300 lines of javascript we were able to glue together everything we'd need in a production system.

There is one other forward-looking feature that Vega has. From a vega graph definition, a server-side executable can render a static image. This is useful in a variety of cases, and has been palpably absent from our current visualization solution.

Storage

The file based approach caused problems in Limn because it's deceptively easy to start and hard to add things like ownership, persistence, versioning, etc. As things stand, we'd have to create web hooks that call git commands on the server to manipulate the file changes, and doing that for a multi-user system doesn't sound like fun.

The SQL database approach would satisfy all the requirements, but it means we have a distinct additional tier to manage.

Simplicity involves re-using existing solutions as long as they don't have to be hacked too much. Since EventLogging stores schemas in the Schema namespace, we feel it's natural to store metadata in a separate namespace as well. We still have to check with the core team to make sure it's ok and that we could get our domain on the CORS whitelist (or solve the CORS problem in some other way). But purely from our point of view, this solution is best.

Diagram

Other things we have looked at

Frameworks: Angular, Polymer, Ember, Durandal, Ractive, jquery, plain web components, cursory look at a lot of others

Visualization: Rickshaw, epoch, Tessera, Hicharts, dygraphs, limn, a lot of others

Storage: file-based, SQL databases in general, services