Analytics/Reportcard/2.0/Database design

Request for comment (RFC)
Database design
Component General
Creation date
Author(s) Erik Zachte, Diederik van Liere
Document status draft
withdrawn

Database design Report Card 2.0 edit

This page documents the database design for the new reportcard ([http://reportcard.wmflabs.org/reportCard/ reportcard.wmflabs.org/reportCard/). The document consists of two sections:

  • the current database design consists of 10 tables. Most of these tables have the same structure.
  • a new simplified proposed design. Please chime in on the Talk page with your thoughts.

The high level goals are documented here Analytics/Reportcard/Requirements.

Current Table Designs edit

CREATE TABLE `binaries` (
  `date` date NOT NULL,
  `project_code` char(2) NOT NULL DEFAULT '',
  `language_code` char(15) NOT NULL DEFAULT '',
  `extension` varchar(10) NOT NULL DEFAULT '',
  `binaries` bigint(15) DEFAULT NULL,
  PRIMARY KEY (`date`,`project_code`,`language_code`,`extension`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `page_views` (
  `date` date NOT NULL,
  `project_code` char(2) NOT NULL DEFAULT '',
  `language_code` char(15) NOT NULL DEFAULT '',
  `views_non_mobile_raw` bigint(15) DEFAULT NULL,
  `views_mobile_raw` bigint(15) DEFAULT NULL,
  `views_non_mobile_normalized` bigint(15) DEFAULT NULL,
  `views_mobile_normalized` bigint(15) DEFAULT NULL,
  `views_raw` bigint(15) DEFAULT NULL,
  `views_normalized` bigint(15) DEFAULT NULL,
  PRIMARY KEY (`date`,`project_code`,`language_code`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `offline` (
  `date` date NOT NULL,
  `source` varchar(255) NOT NULL DEFAULT '',
  `readers` bigint(12) NOT NULL DEFAULT '0',
  PRIMARY KEY (`date`,`source`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `mobilepageviews` (
  `date` date NOT NULL,
  `language_code` char(15) NOT NULL DEFAULT '',
  `project_code` varchar(10) NOT NULL DEFAULT '',
  `country_code` varchar(3) NOT NULL DEFAULT '',
  `value` bigint(12) NOT NULL DEFAULT '0',
  PRIMARY KEY (`date`,`language_code`,`project_code`,`country_code`,`value`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `editortargets` (
  `date` date NOT NULL,
  `language_code` char(15) NOT NULL DEFAULT '',
  `project_code` varchar(10) NOT NULL DEFAULT '',
  `country_code` varchar(3) NOT NULL DEFAULT '',
  `value` bigint(12) NOT NULL DEFAULT '0',
  PRIMARY KEY (`date`,`language_code`,`project_code`,`country_code`,`value`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `editorsbygeography` (
  `date` date NOT NULL,
  `language_code` char(15) NOT NULL DEFAULT '',
  `project_code` varchar(10) NOT NULL DEFAULT '',
  `country_code` varchar(3) NOT NULL DEFAULT '',
  `value` bigint(12) NOT NULL DEFAULT '0',
  PRIMARY KEY (`date`,`language_code`,`project_code`,`country_code`,`value`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `wikistats` (
  `date` date NOT NULL,
  `project_code` varchar(10) NOT NULL DEFAULT '',
  `language_code` varchar(15) NOT NULL DEFAULT '',
  `editors_all_time` int(10) DEFAULT NULL,
  `editors_new` int(7) DEFAULT NULL,
  `editors_ge_5` int(7) DEFAULT NULL,
  `editors_ge_25` int(7) DEFAULT NULL,
  `editors_ge_100` int(7) DEFAULT NULL,
  `articles` int(12) DEFAULT NULL,
  `articles_new_per_day` int(9) DEFAULT NULL,
  `articles_over_bytes_500` int(12) DEFAULT NULL,
  `articles_over_bytes_2000` int(12) DEFAULT NULL,
  `edits_per_article` decimal(9,1) DEFAULT NULL,
  `bytes_per_article` decimal(9,1) DEFAULT NULL,
  `edits` int(12) DEFAULT NULL,
  `size_in_bytes` int(15) DEFAULT NULL,
  `size_in_words` int(15) DEFAULT NULL,
  `links_internal` int(15) DEFAULT NULL,
  `links_interwiki` int(15) DEFAULT NULL,
  `links_image` int(15) DEFAULT NULL,
  `links_external` int(15) DEFAULT NULL,
  `redirects` int(15) DEFAULT NULL,
  PRIMARY KEY (`date`,`project_code`,`language_code`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

Suggested Table Design edit

Summary of changes:

  • Merge different tables in one single table
  • change CHARSET to UTF8
  • change column type date to timestamp
  • set engine to Innodb instead of MyISAM.
  • add view column which is of type ENUM
  • change size of extension field to 4
  • change size of language_code field to 4
  • change bigint to bigint unsigned as 'value' will always be 0 or large
  • rename binaries to value
  • rename table name to aggregated_data
CREATE TABLE `analytics_aggregation` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `hashed_key` integer NOT NULL,
  `trait_instance_key` varchar(512) DEFAULT NULL COMMENT 'hashed_key is generated from this more descriptive key',
  `count` bigint(9) unsigned DEFAULT NULL,
  `project` char(32) NOT NULL COMMENT 'Wikimedia project code',
  `language_code` char(4) NOT NULL COMMENT 'Wikimedia language code, usually derived from project URL',
  `aggregate_timestamp` timestamp NULL DEFAULT NULL COMMENT 'timestamp for which this aggregate counts data',
  `granularity` integer NULL DEFAULT NULL COMMENT 'time granularity in seconds for which this aggregate counts data ',
  `theme` enum('readers','editors','devices','articles','diversity','ecosystem','files','context') DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `hashed_key` (`hashed_key`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Background Links edit