Extension:EventLogging/Todos

If you're interested in diving in, get in touch with Ori Livneh.

Schemas edit

  •   Done Make sure all properties have helpful "description" fields.

Server-side schema handling edit

  •   Done Write Python abstraction for grabbing schemas from metawiki.
  •   Done Validate incoming events against declared schema.
  •   Done Generate SQL schema from JSON Schema (WIP: see 'glass' project in Gerrit).
  •   Done Automatically INSERT TABLE when new schema encountered. (But carefully consider security and scalability implications.)

Monitoring edit

  •   Done Watch for truncated events (tell-tale sign: missing trailing ';' in query string).
  •   Done Keep sequence ID counters (one per host) and watch for gaps, which indicate packet loss.
  • Keep tabs on rate of incoming invalid events and emit alerts as appropriate.
  • Emit alerts as bona fide, subscribable events.
  • Write gmond plugin to send stats to Ganglia.
  • Create new $wgDebugLogGroup that writes to vanadium; use it to log EventLogging alerts from Apaches.

Storage / archiving edit

  •   Done Set up automatic archiving and log rotation of raw event log data dump.
  • Figure out a sane MySQL permissions scheme.
  • Make sure Hadoop is getting all events, not just esams.
  •   Done Make sure MySQL insert failures are handled gracefully.
  • Failover & replication plans.
  • If required: write up specs for add'l machine.

Client-side edit

  • Migrate remaining ClickTracking clients (see Trello card for list).
  • Reliably generate the anonymous user cookie & token (currently done by E3Experiment's openTask.js with generateId() function copy-pasted from mediawiki.user.js).
  • always supply this as _token, like _rv and _id?
  • Provide default implementations for common fields.
  • If we continue with a userbuckets cookie to determine client-side behavior, then take over code from ClickTracking's ext.UserBuckets.js (and mediawiki.user.js) and fix bugs.
  • Handle excessively long query strings, relevant because varnish only logs the first 255 characters of the query string!

  Done As and when Mobile team begins to use EventLogging features, deploy the extension to wikis beyond enwiki.

PHP-side edit

  • Assuming we continue to log events on the server (currently account_create events), reimplement an appropriate subset of client-side logging in PHP.

Misc edit

  •   Done Puppetize.
  • More unit tests.
  • Documentation.
  •   Done DevServer.php should validate schema (WIP, staged in Ori's repo)
  • Improve dev tooling on Metawiki. Write a a small JavaScript module for Schema: pages that:
  • generates the $wgResourceLoaderModules declaration, so one can simply copy/paste schema module setup code.
  • provides a textarea for pasting a JSON object and checking if it validates against the schema.
  •   Done Test varnish patch referenced in RT 4094. Let Mark know how it goes.
  • Deploy CodeEditor to Meta (see Gerrit change 36343).
  • Override JSON validation error messages (see includes/JsonSchema.i18n.php) on Meta with nicer template.
  • Read the JSON Schema spec in full and do a "conceptual lint": figure out what we're doing wrong or not utilizing.