Wikimedia Product/Data dictionary/repo active editors

The cchen.repo_active_editors table (available on Hive) contains active editors data, generated by aggregating wmf.editors_daily and neilpquinn.editor_month on Hive by month. It is stored in the Parquet columnar file format and partitioned by month.

This page describes the data set repo_active_editors that is loaded from cchen.repo_active_editors on Hive through Presto, which can be accessed via Superset.

Schema edit

Field name data type description data example source schema source field
project string Project name from hostname acewiki wmf.editors_daily

neilpquinn.editor_month

project
project_family string Project family name wikipedia wmf.editors_daily

neilpquinn.editor_month

database_group
market string Global markets (see definition) Global North canonical_data.countries economic_region
active_editors bigint Number of active editors (see definition) 10000 wmf.editors_daily

neilpquinn.editor_month

count(*) then aggregated by month
new_active_editors bigint Number of new active editors (see definition) 5 wmf.editors_daily

neilpquinn.editor_month

sum(cast(registration_month = month as int)) then aggregated by month
returning_active_editors bigint Number of returning active editors (see definition) 49 wmf.editors_daily

neilpquinn.editor_month

sum(cast(registration_month != month as int)) then aggregated by month

Note: In order to get unique editors count for each level of the dimensions, in project, market and project_family, there are values equal to "All" to show the sum of editors within certain groups.

  • To view active editors data by project, add a filter with market = "All".
  • To view active editors data by project family, add filters with market = "All" and project = "All".
  • To view active editors data by diversity markets, add filters with project_family = "All" and project = "All".

Dashboards which use this table edit

Editors Dashboard

Known issues and changes edit