Product Analytics/Reporting Guidelines
The Product Analytics team produces several types of reports:
- One-time substantial reports after the completion of a project and/or analysis. These are typically published as either a wiki-page or a PDF.
- One shot analysis of specific questions from product teams. These are often published as comments on a Phabricator ticket.
- Recurring reports such as weekly or monthly statistics about a project. These are typically published internally, or available externally through a shared analytics resource.
Common guidelines
edit- All types of reports should describe the purpose of the project and the analysis.
- Metrics should be defined, and reference relevant standardizations where applicable (e.g. standardized retention metric).
- There's no need to create a standalone report if simply reporting the results in a relevant Phab ticket will suffice.
Types of reports
editPhabricator comments
editSometimes it is enough to post results of your analysis as a comment on the relevant Phabricator task.
- Provide sufficient detail about the methods used to gather data (e.g. provide a SQL query, define date ranges).
- Define any relevant assumptions.
- Uploading graphs directly to Phabricator is fine. If community members ask, the graphs might also be uploaded to Commons.
Generating tables
editPython:
# df is a Pandas DataFrame
df.to_markdown(index = False)
R:
library(knitr)
cat(kable(data_frame, format = "markdown"))
In either case you will need to remove the alignment row (the second row which has colons that in most Markdown flavors is used to specify column alignment but which does not work in Phabricator's flavor).
Recurring reports
edit- Recurring reports are expected to be more lightweight and do not need to provide deeper discussions/analysis of the data, they are instead expected to be more of a data summary.
- If the report is generated from a Jupyter Notebook, include a button to show/hide the code.
- the wmfdata package has a function for this:
utils.insert_code_toggle()
- For creating a clean HTML version from the command line use Quarto (which includes code folding)
- the wmfdata package has a function for this:
- Make it clear when the report was last updated and what range of data it contains.
- Make it clear who authored the report and who maintains it and can answer questions.
- The report should be easily accessible to relevant stakeholders (e.g. by having it hosted publicly if possible, and adhering to data publication guidelines).
Ad-hoc reports
edit- These reports should include an executive summary of the results and the recommendations that follow from the analysis.
- Make it clear who authored the report and who maintains it and can answer questions.
- If the definition of a given method/metric becomes substantial, consider moving it to an Appendix and referring to it as reading for those who are interested.
- Publishing the reports on-wiki (e.g. as a sub-page of a team's pages on MediaWiki-wiki) enables translation into other languages through standardized translation practices on wikis.
Publishing reports
editImportant information Before publishing any report, please consult and follow the Data Publication Guidelines. |
HTML
editWhile you can convert a Jupyter notebook to HTML using the built-in feature (which uses nbconvert under the hood), the recommended way to create an HTML report from a Jupyter notebook is to use Quarto. Refer to these instructions for installing and using Quarto.
Template for Quarto
editWhether you are converting a Jupyter notebook to an HTML document with Quarto or are working with a Quarto Markdown document (.qmd file), you can use this YAML header template:
---
title: "REPORT TITLE"
author: "YOUR NAME, YOUR TEAM"
date-modified: now
date-format: iso
date: "DATE IN YYYY-MM-DD FORMAT"
format:
html:
theme: default
toc: true
code-fold: true
code-tools:
source: "URI TO NOTEBOOK ON GITLAB"
embed-resources: true
html-math-method:
method: mathjax
url: "https://tools-static.wmflabs.org/cdnjs/ajax/libs/mathjax/3.2.2/es5/tex-mml-chtml.min.js"
---
One nice thing this template does is it links to WMF-hosted MathJax library, rather than using an external CDN.
If you're converting a Jupyter notebook, this snippet needs to be at the top of the notebook as a raw cell.
analytics.wikimedia.org
editWhen publishing from analytics cluster (stat100X hosts), follow these instructions. This will make pages available on analytics.wikimedia.org – e.g. https://analytics.wikimedia.org/published/reports/wikipedia-android-app/suggested-edits-v2.html and https://analytics.wikimedia.org/published/reports/wikipedia-android-app/metrics/
NOTE: Jupyter restricts user's write permissions to within home directory for security reasons. So be sure to copy or move files into /srv/published via SSH in Terminal, as opposed to Terminal inside Jupyter. If you're planning on scheduling a recurring job via crontab
to re-run and publish a report with some frequency, that has to be done via SSH (not Jupyter Terminal) also.
people.wikimedia.org
editThere's also the option of hosting the page in your personal directory on people.wikimedia.org (example).
You can either upload the files using an SFTP client like Transmit or use scp
in Terminal. Another method is to put the files in a public git repository, clone the repo to ~/public_html on people.wikimedia.org and schedule cron job to git pull
every now and then.
To restrict access to the files to users of the wmf
and nda
groups (similar to how Superset is restricted), follow these instructions (example at T290693#7343430).
nbviewer.org
editYou can also upload the Jupyter notebook to a publicly accessible repository on GitLab (preferred) or GitHub and use nbviewer.org to render the notebook, which usually renders the notebook better than the built-in renderer in GitLab or GitHub.
Quarto Pub
editIf you are uploading the report created with Quarto to Quarto Pub, make sure that it adheres to the Data Publication Guidelines – that is, either it is Low Risk or Medium Risk but sanitized since this is a non-Wikimedia server.
PDFs
editFor PDF reports the recommendation is to upload it to Wikimedia Commons. After uploading, edit the file to have the following meta information:
=={{int:license-header}}==
{{WMF-staff-upload|license=cc-by-sa-4.0}}
{{Wikimedia trademark}}
See Impact of sitemaps on Italian Wikipedia search engine-referred traffic for example.
Future work
edit- Using the Template:Wikimedia engineering project information to describe the report and defining the project's start and end dates will automatically categorize the report into the relevant date-based categories for WMF projects. [FIXME: have a Product Analytics-specific template for this]
- If the report covers a large range of data, consider adding the ability to filter/focus on parts of the data through dynamic graphs. [FIXME: we need to know how to do this]