Nurturing data-informed decision-making in Product since 2018-02-01.
We deliver quantitatively-based user insights to inform decision-making in support of Wikimedia’s strategic direction toward service and equity.
We strive to provide guidance, insights, and data that are:
Ethical • Trusted • Impactful • Accessible • Inclusive • Inspired
What We DoEdit
Product Analytics contributes to the Wikimedia Movement through our work with Product teams and departments across the Foundation.
Our responsibilities include:
- Empowering others to make data-informed decisions through education and self-service analytics tools
- Helping set and track goals that are achievable and measurable
- Ensuring that Wikimedia products collect useful, high quality data without harming user privacy
- Extracting insights through ad-hoc analyses and machine learning projects
- Building dashboards and reports for tracking success and health metrics
- Designing and analyzing experiments (A/B tests)
- Developing tools and software for working with data, in collaboration with Analytics Engineering and Product teams.
- Addressing data-related issues in collaboration with teams like Analytics Engineering, Security, and Legal
Product Team SupportEdit
Each analyst is a point person for a team, project, or program. Our goals are to maintain context and domain knowledge while also allowing for flexibility in analyst work assignments. For more information about how we work with Product teams, see Working with Product Analytics.
|Analyst||FY21-22 Point person for…|
|Connie||Search Platform & Structured Data|
|Irene||Campaigns & Data as a Service|
|Jennifer||Anti-Harassment Tools & Web|
|Maya||Metrics Platform & Data as a Service|
|Megan||Editing & Language|
|Morten||Growth & Newcomer Experience|
|Neil||Inuka & Trust & Safety Engineering|
|Shay||Android & iOS|
Teams that do not currently have an assigned point person are encouraged to submit requests through Phabricator. Depending on the team's capacity and organizational needs, we may also accept requests from others in the Wikimedia Foundation. The team reserves "10 percent time" to work on professional development.
Who is on the teamEdit
Listed alphabetically by first name within each section
- Kate Zimmerman, Director of Data Science
- Ask me about: Collaborating with Product Analytics, using data to inform product and business decisions, experiment design, decision science, applied stats
- Mikhail Popov, Data Science Manager
- Ask me about: Collaborating with Product Analytics, R, data visualization, search & traffic logs, querying product data, statistical models, Bayesian methodology, machine learning, Better Use of Data program, Event Platform, Metrics Platform
- Connie Chen, Sr. Data Scientist
- Ask me about:
- Irene Florez, Data Scientist III
- Ask me about:
- Jennifer Wang, Staff Data Scientist
- Ask me about: AHT/Comm tech metrics
- Maya Kampurath, Sr. Analyst
- Ask me about:
- Megan Neisler, Sr. Data Scientist
- Ask me about: R, data visualization, reader metrics, technical writing
- Morten Warncke-Wang, Staff Data Scientist
- Ask me about: R, machine learning, spatial (geographic) models, article quality, editor/editing/newcomer metrics, prior research on Wikipedia, and perhaps also time-series modeling (forecasting)
- Neil Shah-Quinn, Sr. Data Scientist
- Ask me about: Python for data analysis, SWAP, editor metrics, new editor research
- Shay Nowick, Sr. Data Scientist
- Ask me about: Mobile metrics, Pydata and Jupyter Notebooks, cohort analysis
How to get help with data or analysisEdit
If you'd like to request data, analysis, or advice, create a task in Phabricator or send an email to email@example.com.
Requests are reviewed by Product Analytics and inform the direction and priorities of data projects. A team member will follow up about whether we’ll be able to work on your request.
Some questions may be suited to consultation hours; see Product Analytics Consultation Hours for more information and a link to book appointments.
Provide the following information to help us prioritize and respond to your request appropriately:
- Name for main point of contact and contact preference
- We use Phabricator to track our work and provide progress updates. Please let us know if you would like us to follow up by other methods (e.g. email).
- What teams or departments is this for?
- This helps us understand who will be using the analysis.
- What are your goals? How will you use this data or analysis?
- This helps us understand the context and priority. What decisions do you need data to inform? Will you take different actions depending on the direction of the data? Do you want to share data publicly? Do you want to include data in a narrative or message (e.g. for PR, audience engagement, or fundraising)?
- What are the details of your request? Include relevant timelines or deadlines
- Is there a date after which the analysis will no longer be useful? Please provide any timeline/relevant deadlines, requested formats, examples, links to documentation, or other information that would help us understand your request.
- Is this request urgent or time sensitive?
- We try to reply to “Urgent” requests immediately and “Time sensitive” requests by the end of the workday. All other requests will be prioritized during our weekly triage.
Note: We use Phabricator to track our work, and by default tickets are publicly visible. If any part of your request is sensitive and should be kept confidential, let us know.
How to contact usEdit
- Contact information for team members are available on their user pages (linked above).
- Group mailing list: product-analytics wikimedia.org
Data references and reportsEdit
- Comparison datasets
- Data Dictionary (documents data sources, such as those available in Superset and Turnilo)
- Data Glossary (definitions for core metrics)
- A/B Testing
- Data Products (various deliverables such as reports, analyses, and datasets)
- Movement metrics
- Our repository of scheduled jobs
- ETL repository (e.g. Oozie workflows)
- Experiment Platform draft
Guidelines and best practicesEdit
- Data access guidelines
- Query style guide
- Reporting Guidelines
- Dashboarding Guidelines
- Tips and Tricks
- Querying JSON-containing data (notes from Mikhail Popov on how to query JSON data with Presto)
- Analysis gotchas (notes from Isaac Johnson on common gotchas when analyzing the Mediawiki landscape)
- Logistic regression, multilevel models, and t-tests (a simulation study inspired by experiments in improving Wikipedia editing experience, and demonstrating multiple methodologies for analyzing data)
- Simulation study of statistical methods for comparing group (examples and informal evaluations of various statistical significance tests for comparing observations generated from different distributions and families)
- Using log transformations in linear regression models (notes from Mikhail Popov)
- Caching in R (notes & best practices from Mikhail Popov)
Documentation for tools we useEdit
- Phabricator (managing requests and tracking work)
- Superset (WMF internal dashboards and reports)
- Obtaining access to Superset/Turnilo, with explanation of LDAP/Developer Account terminology
- Turnilo (WMF internal tool for pivoting and exploring data)
- Event Platform (Various event stream distribution and processing systems we employ at WMF)
- Google Search Console access