Wikimedia Technical Documentation Team/Doc metrics/User guide

This page explains how to use the test doc metrics (v0) to identify tech docs pages and collections to improve, and how to improve them. To learn more about the process we used to define, design, and implement these metrics, see Doc_metrics/v0.

Access metrics data

edit

Because this is an experiment, we don't have a fancy dashboard for you to use to explore the data. Your options are:

  • Best option: Prepared spreadsheets with collection-level scores, raw data per-page and per-collection, and a pre-made Pivot table that you can use to browse pages and scores within a collection and across metric categories:
  • Self-service option: Raw CSV, download from GitLab. This doesn't have standardized metrics scores, so it may be harder to interpret the data. However, this simpler format may make it easier for you to expore the data using your preferred analysis techniques or software.

Understand key terms

edit
Collection
A group of technical documentation pages. Different types of collections capture the varying ways in which pages can be related. For example: pages about the same product or software; pages of the same content type (like tutorials, reference, or landing pages); or pages that support a given developer task or user journey.
Page
A unit of technical documentation published on a wiki. Generally synonymous with "doc". Tech docs can also be published as HTML pages, but for the purpose of the v0 metrics test, "page" refers only to technical documentation pages on mediawiki.org.
Doc attribute
An aspect of a documentation page that can be measured or assigned a value. Page content, page metadata, or traffic and revisions data can all contain doc attributes. For the v0 metrics test, we assessed 30 doc attributes.
Metric
An aspect of technical documentation quality that we care about, but can't understand based on a single doc attribute or data point. Metrics represent the numerically-encoded relationship between a doc attribute and its meaning. The v0 metrics proposal had 9 metrics, but we only tested 7 metrics.
Metric category
The high-level technical documentation quality objective to which a metric corresponds. We don't necessarily care about metrics for their own sake: we care about them as tools for tracking progress towards these (even harder to measure) goals of making our tech docs Relevant, Accurate, Usable, and Findable.


Understand constraints

edit

This data was generated manually by humans who looked at pages and/or used existing data sources to gather the input data.

Our test dataset included 140 pages of technical documentation on mediawiki.org. We gathered data manually because:

  • The majority of the doc attributes we wanted to assess are not currently available in existing data sources.
  • Our current goal is to assess whether having such data for metrics calculation would be useful.

The doc collections, and the pages within them, are manually curated.

User:TBurmeister_(WMF) used various techniques to identify sets of pages that are related by topic, by the developer workflow(s) they support, etc. The process of identifying pages in a collection is currently more of an art than a science.
  • We used the PagePile tool to define and store the list of pages within a collection. Some PagePiles include docs outside of mediawiki.org, but those docs weren't included in this metrics test.

We only assessed content on mediawiki.org.

There may be technical documentation that should be considered part of these collections but is stored with code or on other wikis, and is thus excluded from the test.

Interpret metrics scores

edit

Don't compare raw scores across metrics

edit

The only scores that you can safely compare across metrics are the standardized scores, which are only provided at the collection level, not for individual pages.

You can't compare the raw scores because each of the metrics is calculated based on a different set of doc attributes, with varying weights. The weights attempt to capture how strong of an indicator a given doc attribute may be for that specific metric. For example:

  • If a page uses a list or a table, those elements can help make the page more succinct, since the page is less likely to be prose-heavy. However, whether a list or table is a valid way to format page content varies significantly, and a page could still have walls of text around the lists or tables. So, this is a weak indicator for the "Succint" metric.
  • Consequently, in the metrics calculations: if a list or table is present, the page gets a small score increase (10 points), but pages don't get penalized for not having a list or a table, and they get a larger score increase for other, stronger indicators, like page size.

For some attributes, the weights are based on value ranges ("bins"), which were created based on benchmarks from the test dataset. This was necessary in order to make sense of large ranges of real-valued inputs, where a given range of values may have variable meaning -- both for how we interpret the doc attribute itself, and for how that doc attribute's values influence the metrics that use it. The weighting for each metric and its constituent doc attributes is documented in the Reference page.

Scores favor reward over penalty

edit

The best practices, formatting options, and ideal content for any given page of technical documentation are often ambiguous, subjective, and context-dependent. The metrics calculations account for this by rewarding pages with score increases if they have certain attributes, but not penalizing pages for lacking those attributes.

Example:

  • If a page has any sections differentiated by headings, the ConsistentStructure metric score increases by 50 points. However, pages aren't penalized for not having page sections.

In general, the metrics are fairly conservative about negative scores. The only doc attributes that may reduce a metric's score are:

  • See Also section contains more than 6 links
  • Page size in bytes: landing pages are penalized for being too long or too short; other doc types are only penalized for being so short that the content likely doesn't need its own page.
  • Number of page sections: only pages that are not landing pages nor reference pages are penalized, and only if they have more than 20 sections.
  • Incoming links and redirects from same wiki: pages are only penalized if there are 0 incoming links, which means they're an "orphan page".
  • More than 50% of edits made by a single top editor, indicating the page's maintenance is at risk due to having a single point of failure (SPOF).

All of the above is documented in more detail in the Reference page.

Zero isn't necessarily a bad score

edit

Pages may score zero for a given doc attribute or metric total for several reasons:

Option 1: Explore data by collection

edit
  1. Choose a collection to explore. For this metrics test, we defined 5 collections, of three different types. 6 of the 140 pages are members of more than one collection.
  2. View the standardized score for your collection on the "Standardized scores by collection" tab. These scores are normalized across metrics, so you can use them to identify how your collection compares to others, and where your collection scored lower than average.
  3. To explore which pages within your collection influenced its score for a given metric, you have two options:
    • Use the "Raw scores by page and collection" tab. This tab contains all the data for all pages, nested by collection.
    • Use the tab labeled with the name of your collection. This tab contains all the data only for pages in a single collection.
  4. Follow the instructions below to identify improvements for pages.

Option 2: Explore data by metric

edit

Background info: Where did these metrics come from?

  1. Choose a metric to explore.
  2. View the "Standardized scores by collection" tab to see which collections scored highest/lowest for the metric you care about. Higher scores are better. These scores are normalized across metrics, so you can use them to assess how the collections scored for your metric vs. the other metrics. Be aware that some of the metrics, like CodeConnection, have only a couple inputs, while others have 5-7.
  3. View the "All data by page" tab. To see which pages scored highest or lowest, sort the data by the "Total" score column for the metric you care about. Higher scores are better.
  4. Follow the instructions below to identify improvements for pages.

Identify improvements for pages

edit

Whether you're exploring the data by collection, or by metric, the scores are meant to help you identify pages that could be improved, and which types of changes could most improve them. The sections above explained how to drill down to the page level from a collection or metric. Follow this procedure to identify potential doc improvements when you're looking at page-level metrics scores:

If you're not specifically interested in one type of metric:

  1. Sort or filter the page-level data by "Sum of metric Totals". This shows you the pages that scored highest (best) or lowest (worst) based on the sum of their scores across all metrics.
  2. Review the Total scores the page received for each of the 7 metrics. In the output data spreadsheet, the metric scores are in columns labeled "Total score for [metric]":
    • Total score for Succinct: column K
    • Total score for CodeConnection: column N
    • Total score for ConsistentFormat: column Q
    • Total score for CollectionOrientation: column U
    • Total score for consistentStructure: column AA
    • Total score for developers: column AI
    • Total score for Freshness: column AO
  3. Choose a metric for which the page had a low score, and investigate which doc attributes impacted that metric's score. Use the Reference page to help you with this.
If you're having trouble understanding what the scores mean, see the "Interpret metrics scores" section above.

When you have a specific metric you want to investigate:

Use the columns in the page-level data to determine which doc attributes influenced the Total metric score. For each page, the data includes scores for each doc attribute, followed by the total score for the metric based on those attributes. Refer to the Reference page for details of how the doc attribute values impact the metrics calculations.

  • For example: if a page has a low "Developers" total metric score (in column AI), your next step would be to look at the columns preceding that: "developers score from codesamples" (column AB), "developers score from codemulti" (column AC), "developers score from maintainer" (column AD), etc. You don't have to rely only on the field names in the spreadsheet: use the Reference page to identify the names of the columns for each metric and get more info about their role in the metric calculation.

See also

edit