Wikimedia Technical Documentation Team/Doc metrics/Reference
This page documents how the v0 doc metrics test uses individual doc attribute to calculate metrics scores. It also documents all the fields in the input and output datasets.
Metrics definitions
editThis section documents how the metrics score and weight individual doc attributes. This logic is implemented in the calculate_metricName.py
files for each metric.
Metrics --> doc attributes | Field name | Points | Max possible score | Min possible score |
---|---|---|---|---|
Succinct - Docs are easy to skim; avoid walls of text | Total score for Succinct
|
100 | -90 | |
Does the page have headings below level 3 (h3 or === in wikitext) | Succinct score from headings
|
-20 | ||
Intro section precedes the first heading or content element | Succinct score from intro section
|
20 | ||
Lists or tables on the page (less risk of prose-heavy) | Succinct score from lists andor tables
|
10 | ||
Page size in bytes | Succinct score from byte size
|
var | 30 | -30 |
Page title 5 words or less | Succinct score from title length
|
20 | ||
Number of page sections per Xtools | Succinct score from section count
|
var | 20 | -20 |
See Also section contains more than 6 links | Succinct score from lengthy See Also
|
-20 | ||
CodeConnection - Docs are connected to / findable from / automated by code | Total score for CodeConnection
|
30 | 0 | |
If landing page for product/technology: does it link to a code repository? | codeConnection score from landing page repo link
|
30 | ||
Non-landing pages only: Are code samples in source control, or generated from code that is in source control? | codeConnection score from code samples in source
|
30 | ||
CollectionOrientation - Docs orient the reader within a collection; Are organized alongside related pages | Total score for CollectionOrientation
|
130 | 0 | |
For landing pages: does the page use a layout grid? | Collection score from lp layout grid
|
30 | ||
If navigation template on page: is the page itself in the template? | Collection score from nav
|
20 | ||
Navigation menu on page | Collection score from nav
|
50 | ||
For non-landing pages: is there a "Next Steps" section on the page? | Collection score from section types
|
20 | ||
See Also section on the page | Collection score from section types
|
10 | ||
ConsistentFormat - Docs use consistent format and style | Total score for ConsistentFormat
|
50 | 0 | |
Headings use consistent style and format | Score from headings format
|
30 | ||
Landing page: uses layout grid | Score from lp layout grid
|
20 | ||
ConsistentStructure - Docs use consistent organization and structure | Total score for consistentStructure
|
140 | -10 | |
Page has sections differentiated by headings | consistentStructure score from headings format
|
50 | ||
Intro section before first heading or content element | consistentStructure score from sections
|
40 | ||
For landing pages: does the page use a layout grid? | consistentStructure score from lp layout grid
|
20 | ||
If navigation template on page: is the page itself in the template? | consistentStructure score from nav
|
10 | ||
For non-landing pages: is there a "Next Steps" section on the page? | consistentStructure score from sections
|
20 | ||
See Also section contains more than 6 links | consistentStructure score from seealso
|
-10 | ||
Developers - Docs align with real developer tasks and problems | Total score for developers
|
260 | -30 | |
Any code samples present on the page? (not assessed for landing pages) | developers score from codesamples
|
30 | ||
Code samples provided in more than one programming language?(not assessed for landing pages) | developers score from codemulti
|
20 | ||
If landing page for product/technology: contact info for an owner/maintainer? | developers_score_from_maintainer
|
50 | ||
Incoming links and redirects from same wiki | developers score from links
|
var | 100 | -10 |
More than one edit in the last 30 days? | developers score from edits
|
20 | ||
More than 50% of edits made by the top editor? (SPOF test) | developers score from spof
|
-20 | ||
Percentage of page watchers who visited in the last 30 days | developers_score_from_visits
|
var | 40 | 0 |
Freshness - Docs are up-to-date | Total score for Freshness
|
130 | -100 | |
More than one edit in the last 30 days? | freshness score from edits
|
50 | ||
Draft template on page | freshness score from templates
|
-40 | ||
Percentage of page watchers who visited in the last 30 days | freshness_score_from_visits
|
var | 40 | 0 |
Any of: Historical, Outdated, Archived template on page | freshness score from templates
|
-50 | ||
Non-landing pages only: Are code samples in source control, or generated from code that is in source control? | freshness score from codesource
|
40 | ||
More than 50% of edits made by the top editor? (SPOF test) | freshness score from spof
|
-10 |
Doc attributes reference
editThis section includes information about each of the documentation attributes we sought to measure for the metrics test -- including those we ended up having to exclude. The inclusion/exclusion list is also available as a sortable table. For a mapping of input dataset fields to output dataset fields, see the README.
Links to code repos from wiki pages
editDone
This doc attribute helps us assess:
- Whether docs are connected to / findable from / automated by code.
- Whether readers can navigate between docs on-wiki, and content in source code, as collections of related content.
Metrics usage:
- CodeConnection score increase if landing page links to a code repository. Depends on doc type being identified (by a human) as a product/technology landing page.
Related data sources:
Implementation notes for future work:
- Would need to differentiate between links to code repos in templates vs within paragraphs on wiki pages.
- Difficult to define the full list of possible url formulations, including shortlinks, for all the various places people might store their code.
- Difficult to differentiate general links to code repos from links to the code repo aligned with a specific wiki page. Pages may link to upstream repos or other random code references.
Links from code repos to wiki pages
editNot done
This doc attribute helps us assess:
- Whether docs are connected to / findable from / automated by code (#CodeConnection)
- Whether readers can navigate between docs on-wiki and content in source code as collections of related content (#CollectionOrientation)
Notes:
- Could be within README files, other files in subdirectories, or in Github / Gitlab / Gerrit repository summaries
- Unable to measure this due to effort required and no readily available data source: could improvements to code search perhaps help here?
- Infeasibility caused us to drop this attribute for CollectionOrientation metric.
Available data sources for analyzing incoming links:
- Special:LinkSearch (includes all links in all namespaces)
- Problem: Doesn't capture short links like [[gitlab:__]].
- Example list of links to Gerrit.
- Template:Extension has a repo field (among many other fields that may be useful links to code).
- For doc.wikimedia.org, we can access referer data through the webrequest pipeline.
Unavailable data:
- Referer or clickstream data is not available for technical wikis unless the referer is another wiki page, so we can't use that to assess incoming clicks from the many places where people store source code.
- For privacy reasons, pageview tables and the tools that use them limit referer info to coarse buckets (referer_class : Can be none (null, empty or '-'), unknown (domain extraction failed), internal (domain is a wikimedia project), external (search engine) (domain is one of google, yahoo, bing, yandex, baidu, duckduckgo), external (any other)).
Heading depth
editDone
This doc attribute helps us assess:
- Whether docs are succinct and avoid walls of text.
Metrics usage:
- Succinct score decrease if page has headings below L3.
- No impact on score if no headings below L3.
Implementation notes for future work:
- We made data processing for this easier by relying on human (manual) analysis and making it a yes/no question. An automated implementation would need to parse the wikitext and/or HTML to determine the deepest heading present on a page. This could be potentially difficult or infeasible to do at scale.
Heading length
editNot done
This doc attribute could help us assess:
- Whether headings are succinct, and thus easier to read when skimming a page. (#Succinct)
- Whether long heading strings would make translation difficult and perhaps generate even longer translated headings (#Translation)
- Whether the length and complexity of headings is generally consistent within a page, and across pages (#ConsistentFormat)
Measuring the number of words or characters in headings on a page would require a data source that doesn't exist, or we would have to parse the wikitext at a level of complexity that was too much for this test. So, instead -- since we were already generating test data manually -- we just assessed the consistency of the headings as a new, separate doc attribute.
Heading consistency
editDone
This doc attribute helps us assess:
- Whether the length and complexity of headings is generally consistent within a page, and across pages.
- Consistent formatting makes it easier for readers to skim pages and identify sections relevant to their information need.
Metrics usage:
- ConsistentFormat metric increases if headings were assessed to be generally consistent within a page.
- If we weren't sure, but saw no glaring inconsistencies, we input TRUE.
Implementation notes for future work:
- We did this instead of using text analysis of the heading strings, i.e. Heading length. Actually assessing something like consistency of words and phrase structure across headings, across pages, and across languages would probably require some intense NLP. Probably not worth it.
- Assessing heading consistency across related pages would require a mechanism like collection metadata to indicate which pages should be compared to each other. For this test, we only looked at heading consistency within a single page.
Heading quantity
editDone
This doc attribute helps us assess:
- Whether a page has sections differentiated by headings. Headings are the most commonly-used and easily-understood method of adding structure to content on a page. Consistent use of headings within and across pages makes information easier to skim and process.
Metrics usage:
- ConsistentStructure score increases if page has any headings.
Title length
editDone
This doc attribute helps us assess:
- Whether titles are succinct, and thus easier to understand, and to translate.
Metrics usage:
- Succinct score increase if title is fewer than five words.
- Assessment of title length excluded namespace prefixes from the word count.
Due to the overall lack of translations across the sample set, we didn't implement metrics calculations related to translation. In general, all succinctness measurements, and prose linting around plain language, will also be relevant for the translateability of page content.
Title prefix
editNot done
This doc attribute could help us assess:
- Whether pages within the same collection(s) use consistent namespacing (organization), which can be an important indicator of their relatedness and enable easier browsing of content as a group. (#ConsistentStructure)
- Whether pages use consistent namespace format and style, and whether they combine those prefixes with page titles in consistent or standardized ways. This can make it easier to find, create, and navigate pages. (#ConsistentFormat)
We didn't implement this for our test because the sample size was too small: it didn't include enough examples of varied namespacing.
Page sections: quantity
editDone
This doc attribute helps us assess:
- How long a page is, and whether the information on the page is structured. Using sections makes pages easier to read and information easier to find, but a page can also have too many sections, if the amount of information in each one is very minimal.
Metrics usage:
- Succinct score may increase/decrease. Depends on doc type. Input ranges are grouped based on benchmarks from test dataset (details below).
- Benchmarks from test data: Avg: 17.25; Median: 9; Min: 1; Max: 202
- In Xtools computation, all pages have at least 1 section (see the docs).
Because the quality indicated by page length differs by doctype, we should score pages of different types differently:
If doctype = landing page:
Benchmarks from test data: Avg:7.2; Min:1; Max: 16
- More than 5 sections: score 0
- 3-5 sections: score 10
- 1-3 sections: score 20
else if doctype != landing page AND doctype != reference #content pages
Benchmarks from test data: Avg: 29.9; Min: 1; Max: 151
- More than 30 sections: score -20 (too long)
- 21-30 sections: score -10 (likely too long)
- 11-20 sections: score 0 (may be too long)
- 5-10 sections: score 20
- 3-5 sections: score 10
- 1-3 sections: score 0 (may be too short)
Max possible score: 20; Min possible score:-20
- See Page length for similar logic.
Implementation notes for future work:
- Assessing the number of words or page size vs the number of paragraphs or sections would be a more accurate way to determine if the amount of text on a page is sufficiently broken up to avoid negatively impacting comprehension, "skimmability", etc.
Page sections: See also
editDone
This doc attribute helps us assess:
- Whether a page links to related pages, which can help to orient the reader within a collection -- especially if the pages in the collection don't share a common location.
Metrics usage:
- If a "See also" section is present, the CollectionOrientation metric score increases. Output score also depends on Next steps section.
- Based on asking: "Does the page have a section that only links to related resources? (It could be called "See Also", "Further Reading", or similar.)"
Implementation notes for future work:
- The semantics of this attribute are extremely contextual: sometimes "See also" sections are only an indicator of content sharding, while other times they can serve as useful discovery tools. Proceed with caution if trying to draw conclusions about the presence of this attribute on a page.
Page sections: See also length
editDone
This doc attributes helps us assess:
- Whether a "See also" section may have crossed the line from being helpful to having too many links.
- While a "See also" section can help orient readers within a collection by pointing to related pages, an overly long list of links or information can have an opposite, disorienting effect.
- As an often-uncontextualized list of links, a lengthy "See also" can make the page harder to parse overall (#Succinct). It can also be an indicator that information about the topic is too scattered and should be combined into a smaller number of pages (#ConsistentStructure).
Metrics usage:
- ConsistentStructure score decrease if "See also" has more than six links.
- Succinct score decrease if "See also" has more than six links.
Implementation notes for future work:
- Identifying a "See also" section is already somewhat challenging due to inconsistent terminology ("See also", "Additional resources", "Further reading"). Counting the number of links within such sections is an added level of complexity, and likely to be very noisy/inaccurate. Only try to do this if it's really worth it!
Page sections: Next steps
editDone
This doc attribute helps us assess:
- Whether a page orients a reader within a collection of documentation by providing links for sequential navigation. While "Next steps" sections are most valuable in task-focused pages (like tutorials), they can improve the UX of any type of page. "Next steps" sections make it quicker and easier for readers to locate the next pieces of information they're likely to need. They are preferable to "See also" sections, because they provide more context: "These pages are likely useful to you next" rather than "Here is an undifferentiated list of pages that might be of interest".
Metrics usage:
- ConsistentStructure score increases if page has a "Next steps" section. Output for that score also depends on Intro section.
- CollectionOrientation score increases if page has a "Next steps" section. Output for that score also depends on See also section.
Implementation notes for future work:
- The best practice of including a "Next steps" section is not widely known nor used in the Wikimedia technical community. It would make sense to socialize and invest in making this a more common practice before trying to measure it at scale. In our test dataset, only 9 out of 140 pages (6%) had a "Next steps" section.
Page sections: Intro
editDone
This doc attribute helps us assess:
- Whether a page has a lead section that enables readers to quickly assess whether its contents are relevant for their information need. Intro sections that specify an intended audience and summarize what a page covers can help readers build a mental model of the content they can expect to find on the page.
- Intro section content often appear as a snippet in search results, so they are a crucial piece of content to help searchers assess the relevance of a page for their need.
- Including an intro (or "Overview") section is recommended by our technical documentation Style Guide.
Metrics usage:
- Succinct score increase if page has an intro section.
- ConsistentStructure score increase if
TRUE
. Output score also depends on Next steps section. - Input based on: "Is there an intro section preceding the first heading? Or, if the page has no headings, is there an intro section preceding the other content elements on the page?"
Implementation notes for future work:
- This may not be possible to assess based on automatic parsing of wikitext or HTML. Given that there are often various pieces of wikitext that appear before a section header but may not render as the first item on the page, it would be extremely complex, if not impossible, to assess whether the first piece of prose content that appears on a page is a useful intro section.
- Even if using the rendered HTML to try to assess this, we would likely be generating inaccurate data, because the position of elements on the page can vary so widely based on skins, devices, user preferences etc.
Navigation: layout grid
editDone
This doc attribute helps us assess:
- Whether landing pages orient readers to the contents of a collection.
- Using a layout grid on collection-level landing pages orients readers to the collection as a unit, and provides navigation into and across the pages in the collection. Providing groupings of links to support navigation on landing pages is a best practice recommended by our Documentation toolkit.
Metrics usage:
- ConsistentFormat score increase if page coded as a landing page uses a layout grid.
- CollectionOrientation score increase.
- ConsistentStructure score increase.
Implementation notes for future work:
- This doc attribute is overly-used in the test metrics implementation. It's a best practice, but it's not the only way to provide navigation and orientation to a collection. It's not such a strong requirement or essential element that it should influence three different metrics (oops).
Navigation: menu
editDone
This doc attribute helps us assess:
- Whether a page includes a menu to help readers navigate related content, and orient the reader within collections of pages.
CollectionOrientation metric score increases if page has a nav menu. If the page itself is included in the nav menu, the score increases more.
Implementation notes for future work:
- Navigation menus are usually implemented as templates, but there's no consistently-applied standard naming or categorization of those templates (yet).
- It could be feasible to implement standard categorization for navigation templates; Category:Navigation_templates covers a lot, and we could probably achieve a reasonable amount of coverage if we put in some effort to audit and categorize menu templates on mediawiki.org
- Without a mechanism to differentiate nav templates from other templates, this metric input is currently infeasible to do automatically.
Navigation: menu length
editNot done
This doc attribute could helps us assess:
- Whether nav menus are contributing to or detracting from "walls of text" on a page (#Succinct)
- Similar to See also length: whether the number of links provided in the menu is overwhelming, and thus detracting from the menu's usability and the findability of content.
We didn't implement this for the test because we chose to prioritize other navigation-related attributes, and we needed to limit the number of page elements we were manually assessing.
Navigation: coverage
editDone
This doc attribute helps us assess:
- Whether the navigation menu includes all the pages it should.
- While nav menus can't (and shouldn't try to) link to every single page in a large collection: if important pages are missing from the menu, findability suffers.
Metrics usage:
- CollectionOrientation metric score increases if page has a nav menu. If the page itself is included in the nav menu, the score increases more.
- ConsistentStructure metric score increases if page is in the nav menu.
Table of contents
editNot done
This doc attribute could help us assess:
- Whether pages include a table of contents (__TOC__): one of the most commonly-used and expected mechanisms for providing an overview of page structure and navigation between page sections. This is an indicator of consistent organization and structure (#ConsistentStructure).
We were unable to measure this because of variance in how different skins display Tables of contents, and in how/whether the MediaWiki software auto-generates a TOC.
Status templates
editDone
This doc attribute helps us assess:
- Whether a page has a template that indicates it being incomplete, obsolete, or in need of updating. This also reflects whether a page is likely to be accurate, findable, useful, and maintained. Examples:
Metrics usage:
- Freshness score decrease if Draft template is present. If no Draft template, check for Archived, Historical, Outdated: score decrease if any of those are on the page. The score decrease is slightly smaller for "archived" than for "draft".
Related data sources:
- The Technical documentation dashboard already contains a Status templates chart, which shows whether pages within a collection have certain templates on specific wikis.
- Some templates, like Historical, may have built-in functionality that also adds a category. In such cases, you can find pages with the template through a category query, which then makes it easier to combine the page list with additional data.
Implementation notes for future work:
- Wikitech uses different templates.
- We should also consider TODOs, especially in docs stored with code.
- Change if/else logic so score doesn't just decrement for one, and not both, of Draft and Archive (there are surely pages that have both of those templates).
Revisions: month
editDone
This data element used XTools as an intermediary data source to record whether there was more than one edit to a page in the last 30 days.
This doc attribute can help us assess:
- Whether a page is findable, relevant, and updated. However, there is much nuance to how we can interpret this data.
- Even if a page has been recently updated, that doesn't mean it is fully correct or covers all the most recent information. People often add single pieces of information without fully reviewing and updating the rest of the page's content. The inverse is also true: just because a page has not been updated in awhile doesn't necessarily mean it is incorrect.
- As one metrics tester pithily summarized: "Edits can be just reverting vandalism; edits don’t guarantee freshness; and theoretically, accurate pages don’t need to be updated."
Metrics usage:
- Developers score increase if the page had more than one edit in the last 30 days.
- Freshness score increase if the page had more than one edit in the last 30 days.
Related data sources:
- The Technical documentation dashboard includes this data at the aggregate, collection-level, and has an option to display it on a page-by-page basis.
- For the purposes of metrics testing, I wanted to be able to understand whether a given number of edits was good/bad/neutral, so I established benchmarks based on the test dataset. Based on the benchmarks, I created bins and aligned each of them with incremental increases/decreases to the resulting metrics score.
- Another reason for gathering this data for the test (instead of just using the existing dashboard or other revisions data sources) was because I wanted to have page-level data and collection membership in the same place, so I could drill down from collection-level metrics scores to page-level data.
Revisions: top editor
editDone
This data element used XTools as an intermediary data source to quickly and easily see the top editor for a page, and what percentage of page revisions that editor made.
This doc attribute can help us assess:
- Whether a single editor is maintaining a page as the primary maintainer and/or mostly without help.
- Whether a page's findability by other users/editors could be improved.
- Caveats: pages may be findable by many users but only edited by a few. Pages may be useful and accurate but only apply to a small subset of users/editors. To more accurately verify that a page is being found, maintained, and considered useful by the all the people who could benefit from it, we probably need to combine multiple types of revision and traffic data.
Metrics usage:
- Developers score decrease if more than 50% of edits were made by a single editor.
- Freshness score decrease if more than 50% of edits were made by a single editor.
Related data sources:
- There are many, including the Technical Documentation dashboard (see chart for "Most active editors in the last 90 days"). The main implementation challenge is integrating traffic and/or revisions data with content metrics.
Tables and lists
editDone
This doc attribute helps us assess:
- Whether a page is at risk of being too text-heavy and difficult to skim. Tables and lists can help break up "walls of text", and simplify non-sequential information discovery. (#Succinct, #Usable))
- Tables can be hard to read or may not render well on mobile devices (depending on the device, skin, etc. ). If we had pursued mobile usability as one of our test metrics, this attribute would've been relevant for that.
Metrics usage:
- Succinct score increase slightly if a list or a table is present on the page.
- Not every type of content can (or should) be formatted in a list or table, so there is no penalty for pages that don't happen to have a table or list -- only a score bonus for pages that do make use of them.
CSS, HTML
editNot done
This doc attribute helps us assess:
- How custom CSS or HTML is impacts page UX and consistency across pages. Writers occasionally use CSS and HTML to customize how wikitext renders the final page. This can impact the consistency of formatting across pages (#ConsistentFormat), and can negatively impact mobile usability and accessibility (#Usability).
We didn't include this attribute because it was beyond our capacity to assess how custom CSS/HTML impacts page display across many browsers, devices, skins, etc. However, Kbach has started some related work to improve mobile UX for key docs (tracked in phab:T383117).
Page length
editDone
This doc attribute helps us assess:
- Whether a content page is very long, and thus potentially hard to read.
- Whether a content page is too short, and thus potentially obsolete, outdated, or could be combined with other pages.
- Whether a landing page is too long, and thus may contain details that should be moved to content pages.
Metrics usage:
- Succinct score depends on doc type. Input ranges are grouped based on benchmarks from test dataset.
Benchmarks from test data (all doctypes): Avg: 16,128; Min: 97; Max: 177,849
Because the quality indicated by page length differs by doctype, we should score pages of different types differently. In summary: Pages that are less than 500 bytes or more than 15000 bytes cause the page score to decrease. The score decreases more if the page is a landing page. Reference pages are omitted from the assessment. In detail, the metrics computation implements the following:
If doctype = landing page:
Benchmarks from test data(landing pages): Avg: 5,861; Min: 633; Max: 23,591
- 0 - 499: score -30
- 500-2,500: score 10
- 2,501-5,000: score 30
- 5,001 - 7,500: score 20
- 7,501 - 10,000: score 10
- 10,001 - 15,000: score 0
- Over 15k: score -20
Else if doctype != landing page AND doctype != reference #include doctype = NULL here
Benchmarks from test data (content pages): Avg: 17,122; Min: 97, Max: 17,7849
- 0 - 499: score -30
- 500-2,500: score 0
- 2,501-5,000: score 20
- 5,001 - 7,500: score 30
- 7,501 - 10,000: score 20
- 10,001 - 15,000: score 10
- Over 15k: score 0
Max possible score: 30; Min possible score: -30
Implementation notes for future work:
- We used page size in bytes as a proxy for page length, but this doesn't always reflect the actual rendered page length, especially if a page uses many templates or special formatting.
- Special:LongPages displays byte size in descending order for all pages on a wiki. This could be useful for tech doc metrics if we could filter it or group pages by collection. As it is, it's probably too much information with too little context.
- Counting the number of paragraphs on the fully rendered page would be a more accurate reflection of how users experience page length, regardless of what's in the wikitext.
Number of paragraphs
editNot done
See instead: Number of sections and/or Page length.
Translations
editNot done
This doc attribute could help us assess:
- Whether our tech docs are available in more than one language, which makes them more inclusive and thus more usable overall.
We gathered data about translations for our test dataset, but that data made it clear that we didn't need a metrics calculation on top of the raw data:
- In our test dataset of 140 pages, none of the pages were marked "DoNotTranslate".
- Roughly half of the pages had translation markup added to them. Of those 74 pages, only 4 had translations more than 50% complete for all languages shown in the languages menu.
Implementation notes for future work:
- We can simply use the presence of translate syntax on a page to assess whether it is translateable. The Technical documentation dashboard already has a Translations chart that shows the percentage of pages in a collection that have translate syntax.
- Doc maintainers have no control over which docs get translated and when. The most we can do is try to write documentation that is easy to translate. This type of writing is captured by the #Succinct metric. For example: shorter page titles, shorter headings, and more consistent language usage is easier to translate.
Code samples
editDone
This doc attribute helps us assess:
- Whether a page contains code samples. In our annual surveys and other user research, developers consistently rank code samples as one of the most important and useful pieces of content in technical documentation.
- When generating the test dataset, we counted example queries and CLI commands as code for this question.
Metrics usage:
- Developers score increases if any code samples on the page. Not assessed for landing pages.
Code sample languages
editDone
This doc attribute helps us assess:
- Whether a page contains code samples in more than one programming language. This may not always be possible, depending on the topic and context. However, if code samples are provided in more than one language, the benefit of that useful information extends to a wider audience of developers.
- Related info: programming languages used by WMCS users from the 2024 Developer Survey.
Metrics usage:
- Developers score increase if code samples provided in more than one language. Not assessed for landing pages.
Code sample automation
editDone
This doc attribute helps us assess:
- Whether a page contains code samples that are aligned with the actual source code. If code samples are automatically generated from, or stored with source code, they are more likely to be up-to-date and correct.
Metrics usage:
- CodeConnection score increase if code samples in source control, or generated from code that is in source control. Not assessed for landing pages.
- Freshness score increase if code samples in source control, or generated from code that is in source control. Not assessed for landing pages.
Incoming links
editDone
This doc attribute helps us assess:
- Relevance for developers: if many other pages are linking to this page a lot, that may indicate it is relevant, findable, and useful.
- (maybe) Whether many docs are covering the same information, and may need to be de-duplicated.
- (maybe) Whether docs use consistent organization and structure: if docs are consistently organized, heavy interlinking should be less necessary because the content is less sharded, and more easy to navigate based on the page structure or nav menu. (This may be untrue / untestable!)
Metrics usage:
Developers score decreases if a page has no incoming links. Score increases incrementally for ranges of incoming links, based on benchmarks from test data. Bins were necessary in order to make sense of large ranges of real-valued inputs, and because the interpretation of number of incoming links can vary.
Benchmarks from test data: Avg: 31.85; Med: 17.5; Min: 0; Max: 271
- 0 links: -10
- 1-10 links: 0
- 11-20 links: 10
- 20-50 links: 20
- 50-100 links: 30
- More than 100: 40
Max possible score: 100; Min possible score: -10
- Input data comes from MediaWiki via Special:WhatLinksHere (see note below re: T353660). Link counts included incoming links and redirects; excluded translations and transclusions.
Related data sources:
Implementation notes for future work:
- Easily-accessible data sources for analyzing incoming links are limited to links that come from the same wiki. To analyze incoming links across wikis, from code repositories, or from other referers is currently more complicated and/or not feasible. See Links from code repos to wiki pages for more details.
- The related data sources listed above provide useful info at the wiki-level: we need this at a lower level to be able to make sense of the data within a given collection of pages.
- Known issue that impacts data accuracy: phab:T353660 - Links that use Special:MyLanguage don’t appear in Special:WhatLinksHere.
Pageviews
editDone
This doc attribute helps us assess:
- How relevant is a page to developers who know about its existence? For the metrics test, we looked at what percentage of page watchers had visited the page within the last month. This limits the data to users who "watch" pages, but provides us with a unique subset of users who have, at some point, visited the page and starred it. (#Developers)
- If page watchers have visited the page recently, it is more likely that the page is being kept up-to-date. (#Freshness)
Metrics usage:
- Input data was generated based on the the MediaWiki page info data for
Number of page watchers who visited in the last 30 days
divided by theNumber of page watchers
. (Example page info) - Developers score and Freshness score increase incrementally for larger percentages of watcher visits.
- Ranges for scoring were based on benchmarks from the test dataset.
Benchmarks from test data: Avg .36; Median .25; Min 0, Max 1
- 0: score 0
- more than 0 - .25: score 10
- .26 - .5: score 20
- .51 - .75: score 30
- .76 - 1: score 40
Max possible score: 40; Min possible score: 0
Related data sources:
- There are many, including the Technical Documentation dashboard.
Implementation notes for future work:
- Use raw data sources from MediaWiki instead of massaging them into a new computed field for percentage.
- The main implementation challenges are:
- Extracting meaning from pageview metrics. Raw numbers don't enable us to assess a page's findability, relevance, etc. compared to other similar pages, nor contextualize it based on how often we would expect the page's content to be needed, whether it's good that views are trending up/down, etc.
- Integrating traffic and/or revisions data with content metrics into a single dashboard or tool.
Maintainer
editDone
This doc attribute helps us assess:
- Whether landing pages for specific product/technology contain contact information for an owner, maintainer, or steward. The owner/maintainer could be a team or an individual. If there was a link to an issue tracker but no other contact info, we coded this as TRUE.
Metrics usage:
- Developers score increase if
TRUE
. Depends on doc type = landing page.