Product Analytics/Dashboarding Guidelines

Publishing/sharing edit

Before publishing and/or sharing your Superset dashboard, please double check that you have:

  • contact information
  • correct access and permissions for your data and your audience

Refer to the sections below for details.

Contact Info edit

Use the following template for the information at the top or bottom of the dashboard as Markdown:

This dashboard is maintained by {NAME}, [Product Analytics](https://www.mediawiki.org/wiki/Product_Analytics). If you have questions or feedback please email {name}@wikimedia.org or product-analytics@wikimedia.org

Permissions edit

Virtual datasets edit

For Presto-based charts that rely on virtual datasets derived from event data, make sure the stakeholder has been added to analytics-privatedata-access group.

If they are not, ask them to request access through Phabricator. Refer T286746 to as an example.

Physical datasets edit

This is for tables in Hive with files in Hadoop only. Data ingested into Druid will automatically have appropriate permissions.

For charts that rely on Hive tables added as physical datasets, make sure that users outside of your group have read access to the files in Hadoop:

hdfs dfs -chmod -R o+r <path to your table>
If you have a recurring job that updates the dataset, you need to manually update the permissions with this command every time the job completes.
Example edit

Suppose you did your ETL and created a countries.csv that you then make available in Hive via:

import wmfdata as wmf

wmf.hive.load_csv(
    "countries.csv",
    field_spec="name string, iso_code string, economic_region string, maxmind_continent string",
    db_name="canonical_data",
    table_name="countries"
)

You add it as a physical dataset within Superset and create a chart that relies on it. To make sure that everyone can view that chart (and dashboard) you would update permissions with:

hdfs dfs -chmod -R o+r /user/hive/warehouse/canonical_data.db/countries

If you loaded data into Hive manually and have the data available elsewhere, change the path accordingly.