Wikimedia Product/Data dictionary/pageviews_daily
This page describes the data set pageviews_daily
that stores on Druid Datasources, which can be accessed via Superset/Turnilo. pageviews_daily
on Druid is generated by aggregating wmf.pageview_hourly
on Hive by day, while wmf.pageview_hourly
on Hive is extracted from wmf.pageview_actor
.
Schema
editField name | data type | description | data example | source schema | source field |
---|---|---|---|---|---|
project | string | Project name from requests hostname | aa.wikibooks | wmf.pageview_actor | pageview_info['project'] |
agent_type | string | Agent accessing the pages, can be spider, user or automated (see BotDetection) | user | wmf.pageview_actor | agent_type |
ua_browser_family | string | Name of web browser (if not using an official Wikipedia mobile app), extracted from the client device's User-Agent | Opera Mini | wmf.pageview_actor | user_agent_map['browser_family'] |
ua_wmf_app_version | string | Version of official Wikipedia mobile app (for iOS, Android, and KaiOS), extracted from the client device's User-Agent | - | wmf.pageview_actor | user_agent_map['wmf_app_version'] |
country | string | Country (text) of the accessing agents (computed using maxmind GeoIP database) | Iran | wmf.pageview_actor | geocoded_data['country'] |
country_code | string | Country iso code of the accessing agents (computed using maxmind GeoIP database) | IR | wmf.pageview_actor | geocoded_data['country_code'] |
ua_os_major | string | Operating System family used by the client device, extracted from the User-Agent | - | wmf.pageview_actor | user_agent_map['os_major'] |
continent | string | Continent of the accessing agents (computed using maxmind GeoIP database) | Europe | wmf.pageview_actor | geocoded_data['continent'] |
ua_os_family | string | Operating System family used by the client device, extracted from the User-Agent | Other | wmf.pageview_actor | user_agent_map['os_family'] |
language_variant | string | Language variant from requests path (not set if present in project name) | default | wmf.pageview_actor | pageview_info['language_variant'] |
ua_os_minor | string | Minor version of that Operating System, extracted from the client device's User-Agent | - | wmf.pageview_actor | user_agent_map['os_minor'] |
referer_class | string | Can be none (null, empty or \'-\'), unknown (domain extraction failed), internal (domain is a wikimedia project), external (search engine) (domain is one of google, yahoo, bing, yandex, baidu, duckduckgo), external (any other) | internal | wmf.pageview_actor | referer_class |
zero_carrier | string | NULL as zero program is over | Null | wmf.pageview_actor | NULL |
access_method | string | Method used to access the pages, can be desktop, mobile web, or mobile app | desktop | wmf.pageview_actor | access_method |
ua_browser_major | string | Major version of the client browser, extracted from the client device's User-Agent | 4 | wmf.pageview_actor | user_agent_map['browser_major'] |
project_family | string | Project family | wikipedia | canonical_data.wikis | database_group |
view_count | bigint | Number of views | 1 | wmf.pageview_actor | count(1) then aggregated by day |