Wikimedia Technical Documentation Team/Doc metrics/v0

This page summarizes the first set of metrics (v0) the Tech Docs team tested as part of the Doc metrics project.

Key terms

edit
Collection
A group of technical documentation pages. Different types of collections capture the varying ways in which pages can be related. For example: pages about the same product or software; pages of the same content type (like tutorials, reference, or landing pages); or pages that support a given developer task or user journey.
Page
A unit of technical documentation published on a wiki. Generally synonymous with "doc". Tech docs can also be published as HTML pages, but for the purpose of the v0 metrics test, "page" refers only to technical documentation pages on mediawiki.org.
Doc attribute
An aspect of a documentation page that can be measured or assigned a value. Page content, page metadata, or traffic and revisions data can all contain doc attributes. For the v0 metrics test, we assessed 30 doc attributes.
Metric
An aspect of technical documentation quality that we care about, but can't understand based on a single doc attribute or data point. Metrics represent the numerically-encoded relationship between a doc attribute and its meaning. The v0 metrics proposal had 9 metrics, but we only tested 7 metrics.
Metric category
The high-level technical documentation quality objective to which a metric corresponds. We don't necessarily care about metrics for their own sake: we care about them as tools for tracking progress towards these (even harder to measure) goals of making our tech docs Relevant, Accurate, Usable, and Findable.


Metrics: proposed vs. tested

edit

Based on the research and analysis completed during Q1 of FY24-25, TBurmeister proposed 9 metrics for a first round (v0) of technical documentation metrics. As we implemented the test and gathered data, we discovered that 2 of the proposed metrics wouldn't be feasible or necessary. So, our test dataset generated outputs for 7 metrics:

Included in test output? Metric category
Doc metric Relevant Accurate Usable Findable
Docs are succinct: they avoid walls of text; use plain language;

support skimming; limit the cognitive burden on the reader #Succinct

Included ✅✅
Use consistent organization and structure

#ConsistentStructure

Included ✅✅
Are readable on mobile devices, with all information visible

#MobileDevice

Excluded ✅✅
Use consistent format and style

#ConsistentFormat

Included ✅✅
Orient the reader within a collection; Are organized alongside related pages

#CollectionOrientation

Included ✅✅
Freshness

#Freshness

Included ✅✅
Are translatable / translated

#Translation

Excluded
Align with real developer tasks and problems;

include information relevant for technical audiences #Developers

Included ✅✅
Are connected to / findable from / automated by code

#CodeConnection

Included

The green checkboxes in the table above reflect how a given doc metric can be relevant for more than one metric category, and some metrics are stronger indicators for one category than for another. More checkboxes means there's a stronger correlation between that metric and category.

Doc attributes: proposed vs. tested

edit

Our design and research generated a long list of doc attributes that can serve as indicators for the metrics we care about. In order to generate and test metrics within a constrained timeline, we had to pick a subset of those doc attributes to measure. Our choices were influenced by variables like: feasibility of capturing a given doc attribute, uniqueness of an attribute, and availability of data sources. For details about those implementation challenges for each doc attribute, click the links in the table below.

Doc attribute Test status Metric usage
Links to code repos from wiki pages   Done CodeConnection
Links from code repos to wiki pages  N Not done CollectionOrientation
Heading depth   Done Succinct
Heading length  N Not done Succinct, Translation, ConsistentFormat
Heading consistency   Done ConsistentFormat
Heading quantity   Done ConsistentStructure
Title length   Done Succinct
Title namespace prefix  N Not done ConsistentStructure, ConsistentFormat
Page sections: Quantity   Done Succinct
Page sections: See also   Done CollectionOrientation
Page sections: See also length   Done Succinct, ConsistentStructure
Page sections: Next steps   Done CollectionOrientation, ConsistentStructure
Page sections: Intro   Done ConsistentStructure, Succinct
Navigation: Layout grid   Done ConsistentFormat, CollectionOrientation, ConsistentStructure (see: design flaws)
Navigation: menu   Done CollectionOrientation
Navigation: menu length  N Not done Succinct
Navigation: coverage   Done CollectionOrientation, ConsistentStructure
Table of contents  N Not done ConsistentStructure
Revisions: past month   Done Developers, Freshness
Revisions: top editor   Done Developers, Freshness
Status templates   Done Freshness
Tables or lists   Done Succinct
Page length   Done Succinct
Translations  N Not done Usability
Code samples (any)   Done Developers
Code sample languages   Done Developers
Code sample automation   Done CodeConnection, Freshness
Incoming links   Done Developers
Pageviews by watchers   Done Developers, Freshness
Maintainer contact info   Done Developers

Even the list above doesn't capture the full number of doc attributes we considered using, or could use in a future implementation. You can peruse the even larger list in a Google spreadsheet.

Outcomes of metrics testing

edit

Collection-level metrics

edit

Our test dataset included 140 pages of technical documentation on mediawiki.org. We identified these pages by first defining documentation collections, then gathering together pages for each collection as PagePiles. Since we defined collections of varying types, some pages appear in more than one collection.

The test dataset page counts are different from the full collection contents because the collection PagePiles include translation pages, redirects, or auto-generated content that we didn't have the capacity to include in the test. In the case of the ResourceLoader collection, this is a small design flaw in the test, because it means we excluded content that would influence the overall metric scores for the collection.

Collection (PagePile) # pages in collection # pages included in test Collection type
ResourceLoader 62140 47 16 Specific product/technology
REST API docs 61824 8 5 Specific product/technology
Developing extensions 61822 47 47 Developer task/workflow
MediaWiki developer tutorials 61823 55 44 Doc type
Local development setup 61854 55 33 Developer task/workflow

The metrics scores for each collection are initially calculated as averages of the output scores for each page in the collection. Each metric category uses different doc attributes with varying weights to calculate the output.

To enable analysis and testing of the metrics output, I then standardized the raw output scores to generate the final collection score for each metric:

Collection Code Connection standard score Collection Orientation standard score Succinct standard score Consistent Format standard score Developers standard score Consistent Structure standard score Freshness standard score
Developing extensions (61822) 1 0 -1 0 0 0 1
Local dev setup (61854) 0 -2 0 0 -2 -1 -1
MW tutorials (61823) 0 0 -1 1 -1 0 -1
ResourceLoader (62140) 0 1 -1 -1 1 -1 0
REST_API (61824) -1 1 2 0 1 2 1

Page-level metrics

edit

To drill down within each collection, or to see raw metrics scores at the collection and page level, follow the Doc metrics user guide.

User documentation

edit
  • Doc metrics user guide - how to explore and understand the data from this test.
  • Doc metrics reference - explains metrics computations and doc attributes are weighted in scores, covers fields in both input and output datasets

Outcomes: Metrics prototype and assessment

edit