Wikimedia Product/Data dictionary/content pv

The cchen.content_pv table (available on Hive) contains content topics related daily pageview data, generated by aggregating wmf.pageview_hourly and join with isaacj.article_topics_outlinks on Hive. It is stored in the Parquet columnar file format and partitioned by year, month and day.

This page describes the data set content_pv that is loaded from cchen.content_pv on Hive through Presto, which can be accessed via Superset.

Schema edit

Field name data type description data example source schema source field
date timestamp The date of pageviews 2021-05-29 00:00:00.0 wmf.pageview_hourly event_timestamp
project string Project name from hostname hu.wikipedia. wmf.pageview_hourly project
market string Global markets (see definition) Global North canonical_data.countries economic_region
country string Country Albania canonical_data.countries country
country_code string ISO code for country AL canonical_data.countries country_code
topics string Topics related to certain articles using outlink-based model (refer to the taxonomy for detailed article topics) Geography.Geographical isaacj.article_topics_outlinks topic
main_topic string Top level of the topic Geography cchen.topic_component main_topic
sub_topic string Second level of the topic Geographical cchen.topic_component sub_topic
pageviews bigint Number of pageviews 10000 wmf.pageview_hourly count(1) then aggregated year, month, and day

Dashboards which use this table edit

Pageview_Topics_Dashboard

Known issues and changes edit