Wikimedia Technical Documentation Team/Doc metrics/Research

As part of defining an initial set of documentation health metrics for MediaWiki core technical documentation, we need to answer the research questions outlined below. Our approach will involve a variety of research and analysis tasks, such as:

  1. Reviewing previous metrics-related projects, and ideas we had in the past about how to measure technical docs and create meaningful content collections.
  2. Investigating how Wikimedia context differs from the standard software development context, and what that means for the types of data signals that are meaningful to measure.
  3. Reviewing Developer Satisfaction survey results from the past 3 years to remind us of the issues people raise, the workflows they discuss, and what they say about documenation.
  4. Reviewing academic and industry literature about doc health measurement techniques and best practices.
  5. Surveying available data signals and the tools we can use to access them.

Research questions

edit

RQ1: How do we define tech doc health for MediaWiki core?

edit

Goal: Define what technical documentation health means in the Wikimedia context, for the MediaWiki core software. Identify where can we use industry-standard definitions of doc health, where can we use wiki-standard definitions of content quality, and where we we may need to diverge from both of those metrics frameworks.

Questions:

  • While there are many recommended approaches to assessing tech docs, there isn't really an accepted industry standard. How should we define technical documentation quality in a way that is relevant and useful for the Wikimedia context? Some assessment techniques may not be appropriate: they're designed for technical documentation of software that has clearer component boundaries, more centralized ownership, a smaller set of use cases, a less diverse user/audience base, and more objective truth about how the software functions. For example, many standard approaches to measuring API docs describe much smaller collections of content, with clearer content types, and fewer (or less divergent) navigation paths through the content. Given the complexity of MediaWiki and its ecosystem of multiple APIs, extensions, gadgets, etc., can we clearly reason about what the documentation of MediaWiki core includes and doesn't include? Can a human with expertise even assess whether a given doc is "complete" and "correct"?
  • Which wiki-standard definitions of quality are relevant for us? In what ways can/can't we measure technical documentation the same way as Wikipedia articles? Much research and work has been (and continues to be) done around measuring the health of encyclopedic wiki projects. While some of those metrics translate to technical wikis, we need to think carefully about which types of data reflect the salient aspects of technical content and developer workflows. The information on mediawiki.org supports different types of knowledge-seeking and different use cases than projects like Wikipedia. Technical content is created via different processes, it serves different needs, and it has different quality indicators than encyclopedic content.

RQ1: Outcomes

edit
  • Defined metrics criteria based on project requirements, research, and initial set of stakeholder interviews (more community interviews to follow).
  • Identified a set of metrics categories and doc characteristics that we think are most relevant and feasible for Wikimedia technical documentation, with an emphasis on MediaWiki software documentation.

RQ2: What data do we have, and what do we want?

edit

Goal: Map existing data signals and tools to the quality dimensions we care about, and identify gaps.

RQ2: Outcomes

edit

RQ3: What pieces of content are meaningful to measure?

edit

Goal: Identify the scope of MediaWiki documentation collections to use for testing metrics.

Questions:

  • We have technical documentation published on multiple platforms, like mediawiki.org and doc.wikimedia.org. The contents of mediawiki.org extend far beyond just technical documentation of the MediaWiki software itself, so we can't just look at the wiki statistics if we really want to understand the health of the technical docs -- that scope would be too broad. We also may not be able to exclude doc.wikimedia.org, even though its differences from mediawiki.org add complexity to the task of measuring the content -- excluding it might make the scope too narrow.
  • What is the subset of content that we can meaningfully measure instead of trying to measure across the entire corpus?
  • Are there things we could measure, but we couldn't change the metric even if we were measuring it, so we shouldn't waste the effort?
  • What are the developer workflows that we are trying to improve? How can we identify the documentation necessary for those workflows? Once we identify the relevant docs, we can use various methods to measure their health as a collection.

RQ3: Outcomes

edit

Defined and implemented the following collections of MediaWiki pages as PagePiles:

Doc collection Collection contents as PagePile # Pages on mediawiki.org Content dimension
ResourceLoader 62140 47 Technology or system component
REST API docs 61824 8 Technology or system component
Developing extensions 61822 47 Task / workflow
MediaWiki developer tutorials 61823 53 Doc type
Local development setup 61854 55 Task / workflow

Literature review and prior art

edit

This section summarizes my findings from surveying the academic / professional literature, and reviewing past and current Wikimedia projects related to measuring the health of wikis and of technical documentation. This research focused primarily on understanding the dimensions of doc health that have been used and validated (or rejected) as meaningful quality indicators.

Full list of sources

edit

This page cites the written sources that directly influenced the v1 metrics design. For a full list of all the sources consulted during the course of this project, see Doc metrics/Sources.

You can access most of the academic research cited here freely online, or through the Wikipedia Library, which provides access to the ACM digital library.

Summary of literature and prior art

edit

Note: Strimling (2020) and Berger (2020) both include meta-analyses that synthesize metrics categories from a wide range of technical communication studies, dating back as far as the 1990's. They did that work so I didn't have to :-)

Metric Example data / notes Cited in Data domain
Usage
  • Pageviews
  • Interactions / clicks
Berger[1] Traffic (importance of the content)
Freshness
  • Time since last update; currency of the content
  • Time since published
  • Ownership / governance / maintenance
  • "Activity: the project’s updates translate to documentation changes"[2]
Berger[1]; WMCS Survey 2022[3]; Getto et al[4]; Hardin[5]; User:KBach[2]; User:Waldyrious[6] Content, Contributions/Community
Findability
  • Return visits to a page, over time; Bounce rate; Time on page; Pages per visit (TB: we can't track some of these standard web anlytics in Wikimedia context due to privacy)
  • Site search
  • Discoverability
  • Stored close to code[7][8][9]
  • "Structure: Is it easy to find what you are looking for? [documentation is] stored in a single structure that can be easily understood." "[Documentation is not...] present in multiple different places, in multiple different formats" [2] (emphasis added)
  • Structure or layout can make it hard to find important information
  • how all of the topics are presented as one collection; how the content is organized and presented[5]
Berger[1]; WMCS Survey 2020[10], 2021[11], 2022[3]; Hardin[5]; Developer Portal exploratory research[8]; Wikimedia Technical Conference 2019 notes[9]; Write the Docs[7]; User:KBach[2]; User:Pavithraes[12] Content
Support
  • Visits to doc site with no tickets filed
  • Unanswered forum questions
  • Repeated requests to improve docs for X[12]
  • Doc visits vs. tickets vs. forum threads per week
Berger[1]; User:Pavithraes[12] Contributions/Community
Completeness
  • Content audits looking for missing content (completeness; comprehensiveness)
  • "the documentation should describe as much of the software as possible and should clearly indicate when something is not documented"[13]
  • "all features and functionality documented"[2]
Hardin[5]; WMCS Survey 2020[10]; Write the Docs[7]; User:Zakgreant et al[13]; User:KBach; User:Waldyrious[6]
Accuracy
  • Correctness of content
  • Versioning
  • True reflection of how the software works
  • Authoritative
  • Auto-updating correctly (where applicable)
Strimling[14]; Getto et al[4]; Hardin[5]; Write the Docs[7]; User:Zakgreant et al[13]; User:KBach[2]; User:Pavithraes[12] Content
Relevance
  • Practical: help audience solve real problems, docs "answer my question"
  • Redundant, Outdated, or Trivial (ROT) analysis; content audits looking for "documentation bloat"
  • Audience focus
Strimling[14]; WMCS Survey 2021[11]; Hardin[5]; Developer Portal exploratory research[8]; User:Zakgreant et al[13] Content
Easy to understand
  • Clarity, consistency
  • Support skimming
  • Use examples
  • Use consistent language and formatting; follow a style guide[4]
  • Avoid unnecessary duplication; assess "missed opportunities for content reuse"[5]
  • "Consistent – the documentation should save everyone time and effort by being consistent and predictable."[13]
  • "clear expectations for documentation structures ... (where it should live, what components it should cover)"[6]
  • "Readability:[...] documentation style facilitates understanding."[2]
  • Appropriateness of content for a particular audience[4]
Strimling[14]; WMCS Survey 2020[10], 2021[11], 2022[3]; Getto et al[4]; Hardin[5]; User:Waldyrious[6]; Write the Docs[7]; User:KBach[2]; User:Zakgreant et al[13] Content
Accessibility / inclusivity
  • TB: Strimling uses this to mean the ability to access the content, so it's more like Findability. But Accessibility in terms of device compatibility, WCAG, and multilinguality are important, so I'm keeping this line item.
  • "Inclusive: the documentation may be the first point of contact that a person has with the MediaWiki community and should be accessible, friendly and welcoming."[13]
  • "Readability:[...] documentation is accessible to users of assistive technologies, older hardware, and less stable internet connections."[2]
Strimling[14]; User:Zakgreant et al[13]; User:KBach[2] Content

Industry and academic literature review

edit

Raw docs data must be contextualized in order to support meaningful prioritization. Mixed-methods approaches are essential.

  • "...working from an analytics-generated list of pages and focusing on the worst-performing pages each month is not the best plan. A better approach is to audit the docs’ content and combine the findings with your own expertise about the product and content along with metrics such as page views, user sentiment, time on page, and scroll depth."[15]
  • "Often, there is not a one-to-one correlation between a metric and a specific [doc] change, but rather numerous metrics inform a change or an overarching goal, such as delivering the right content to the right user at the right time."[16]

According to the meta-analysis and survey completed by Strimling (2020), the following four information quality dimensions are the most important for documentation:

  • Accurate
  • Relevant
  • Easy to Understand
  • Accessible

Additional criteria for metrics definitions, as identified by Strimling (2019):

  • "The definition must be from the readers’ point of view: Because it is the readers alone who determine if the document we give them is high quality or not, any definition of documentation quality must come from the readers’ perspective. Writers can come up with any number of quality attributes that they think are important, but, at the end of the day, what they think is not as important as what the readers think.
  • The definition must be clear and unequivocal: Both readers and writers have to “be on the same page” when it comes to what makes a document high quality. Misunderstandings of what readers actually want from the documentation are a recipe for unhappy readers.
  • The definition must cover all possible aspects of quality: Quality is a multidimensional concept, and we must be sure that any attempt to define it is as comprehensive as possible. A definition that emphasizes one dimension over another, or leaves one out altogether, cannot be considered to be a usable definition.
  • The definition must have solid empirical backing: To be considered a valid definition of documentation quality, serious research must be done to give it the proper theoretical underpinnings. Years of experience or anecdotal evidence can act as a starting point, but if we are serious about our professionalism and our documentation, we need more."[14]

Berger (2020) summarizes common technical content metrics as represented by papers published in Intercom, the Society for Technical Communication's (STC) practitioner-focused publication. The metrics fall into the following categories (I include only some of the examples in each category):

  • Usage
    • Pageviews
    • Interactions / events
  • Freshness
    • Time since last update
    • Time since published
  • Findability
    • Return visits to a page, over time (we can't track this in the Wikimedia context due to privacy)
    • Bounce rate
    • Site search
    • Time of page
    • Pages per visit
  • Support
    • Visits to doc site with no tickets filed
    • Unanswered forum questions
    • Doc visits vs. tickets vs. forum threads per week
  • Demographic
  • Behavior
    • (Tricia thinks this overlaps with the findability and interactions metrics)
  • Delivery
    • On-time delivery = doc updates in line with product / marketing release

The WMF Tech Docs team is following a similar process to what Berger and their team at IBM used[1]: an iterative approach that uses "Design Thinking" to assess the data we have, identify what we want data to help us understand, and identify gaps between what we want and what we have.

Hardin (2023) describes the variety of factors tech writers and content managers must consider when identifying priority areas: "I not only consult web metrics such as page views. [...], but I also tend to prioritize content areas that I know have specific needs. For example, content areas that might be suffering from documentation bloat, as well as areas that I know have not had active engineering subject matter expert (SME) involvement for some time. Inactivity or no clear ownership or governance leaves content prone to being out of date, stale, or mismanaged."[5]

Hardin goes on to describe a "mixed-methods approach" for doc content audits, which includes "a content quality (also known as a "qualitative" audit), a quantitative audit, and an ROT (redundant, outdated, or trivial) analysis". Hardin's guiding questions for content review include:

  • What content is outdated, inaccurate, or no longer relevant?
  • Is there any content missing that we could add to improve customer experience?
  • Information architecture review. SMEs review how all of the topics are presented as one collection and provide feedback regarding how the content is organized and presented.
  • Does the current workflow make sense for customers?
  • Do existing topics need broken up?
  • Are there any missed opportunities for reuse in the content set?

[5]

These are excellent questions for a content audit, and they align with the criteria and process WMF technical writers use[17]. Content audits must be done by humans, usually writers and SMEs with knowledge of the subject area. They require manual page- and collection-level analysis - they don't correspond to metrics that can be elicited directly from documentation data. What if meaningful documentation metrics require human analysis? Is it worth spending our time gathering data signals if they don't actually tell us meaningful things about our docs? We need to test this. For now, we have defined Project Requirements and Metrics Criteria specifying that any metrics we define should be considered signals, not goals in themselves. We should use metrics in context, along with other evaluation methods and source of knowledge.

In their meta-analysis of content strategy best practices, Getto et al (2019)[4] note that "In assessing content, the best practice varies strongly as well, depending on the purpose of the audit....the closest we get to a definitive best practice is that auditors should begin with some criteria in mind. Some of the general categories of critera from the literature include:

  • Effectiveness of visual design or layout of content
  • Currency of content
  • Authoritativeness of content
  • Adherence of content to a predefined style guide
  • Appropriateness of content for a particular audience"[4]

According to Write the Docs (the leading professional/industry organization for technical writers working on software documentation), content should:

  • Minimize duplication (but also acknowledge that it must exist)
  • Support skimming by following certain stylistic and structural conventions
  • Include examples (but not too many)
  • Use consistent language and formatting
  • Be correct
  • Be versioned
  • Stored close to code
  • Be complete
  • ...and more[7]

Community input and prior work on tech doc metrics

edit

The Wikimedia community has put much thought and effort into improving and measuring the docs, including creating doc strategies and processes, and many discussions that emphasize the challenges and priorities of this work. This section highlights key points from those efforts and related discussions that are relevant for how we measure doc quality.

Goals for the [MediaWiki] Developer Documentation:
  • Easy-to-read – the documentation should be easy for the primary audience to skim, read and understand.
  • Accurate – the documentation should accurately reflect how the software is intended to work and how it is known to work.
  • Practical – the documentation should focus on helping the primary audience effectively solve real problems.
  • Complete – the documentation should describe as much of the software as possible and should clearly indicate when something is not documented.
  • Inclusive – the documentation may be the first point of contact that a person has with the MediaWiki community and should be accessible, friendly and welcoming.
  • Consistent – the documentation should save everyone time and effort by being consistent and predictable.

Conversation notes from Wikimedia Technical Conference 2019 Session highlights the importance of documentation (not specific to MediaWiki docs):

  • Discoverability
  • Maintenance
  • Translation

(See T234634 for lots of good ideas and summary of challenges in these areas)

The Tech Docs team (and contributors) have defined criteria for how to improve documentation and assess its quality, though we haven't yet aligned those criteria to quantitative metrics (the focus of this project). These rubrics reflect documentation attributes that we can consider as part of measuring doc health:

Similarly, the questions we have asked about documentation when doing user surveys reflect criteria that are relevant for consideration as metrics. The annual WMCS survey and Developer Satisfaction surveys contain questions along the following dimensions:

2020 WMCS Survey 2021 WMCS Survey 2022 WMCS Survey
Easy to find

Comprehensive

Clear

Find what i was looking for

Answers my question

Easy to read and understand

Easy to find

Up to date

Clear

The freeform survey responses from 2020 reflect doc quality criteria aligned with the categories outlined above, with emphasis on content types, formats, and overall findability of information (focused on Cloud Services, not MediaWiki):

  • Hard to understand (not user-friendly)
    • Beginner documentation is very hard to understand
    • Needs more graphics
  • Lacking (incomplete)
    • What should I use? What is supported?
    • Reference lists for features, instead of all tutorials
    • More documentation about effective workflows
  • Out of date
  • Structure

Freeform summary from 2021 (focused on Cloud Services, not MediaWiki):

  • Improve and update current documentation, and remove documentation that it’s not needed anymore. (5 comments)
  • Documentation is not new user-friendly, it needs to have more guidelines or tutorials, and make the onboarding process easier for newcomers. (3 comments)
  • Most information it’s hard to find, it feels fragmentary. (6 comments)
  • The layout isn’t intuitive and can be hard to navigate, an easy-search function is needed. (3 comments)

Exploratory conversations leading up to the development of the Developer Portal emphasized the following themes of doc quality:

  • Coverage of programming languages, technical areas, and making it all findable (more visible).
  • Audience focus as key to making docs relevant.
  • Stewardship (ownership, maintenance) directly relates to correctness and completeness of docs.
  • Consistent naming and structuring of docs helps writers and maintainers, improves findability, helps avoid duplication.
  • Connection to code: The closer technical documentation is to the code that it covers, the more likely it is to be findable, accurate, and updated.

Measuring wiki quality

edit

Extensive work and research has gone into measuring the health and quality of content on projects like Wikipedia. Our technical wikis are not entirely comparable to encylopedic projects, so we can't directly adopt the types of metrics used for measuring them. However, we can consider whether some of the approaches or data signals are relevant or can inspire us.

High-level summary of Wikimedia movement metrics / essential metrics, as defined in the Data glossary:

  • Content interactions
    • User and automated pageviews
    • Unique devices
    • Unique devices by region
  • Active editors
    • New active editors
    • Returning active editors
    • Active editors by region
  • Net new content
    • Topics of new content
    • Quality of new content[18]

Definitions from a previous round of metrics standardization work: https://meta.wikimedia.org/wiki/Category:Standardized_metric

Wikimedia descriptive statistics(Google doc with high-level metrics, published by WMF)

The Knowledge Gaps Index and its underlying research framework use the following standard content quality criteria:

  • An article is of standard+ quality if it meets at least 5 of the 6 following criteria:
    • It is at least 8kB long in size
    • It has at least 1 category
    • It has at least 7 sections
    • It is illustrated with 1 or more images
    • Its references are at least 4
    • It has 2 or more intra wiki links.

"For content-based measurements, we aggregate mappings according to two different sets of metrics:

  • Selection-Score (e.g., number of articles for each category of the gap), which reflects how much content exists for each category on a wiki.
  • Extent-Score (e.g., quality of articles based on length, # sections, # images) explains “how good” the articles in each category are."[19]

Ongoing research

edit

Developer workflows

edit

The following list of workflows was extracted from the past 3 years of Developer Satisfaction survey questions and responses, and ongoing work on MediaWiki developer journeys:

Discover:

  • Assess the quality of xyz code
  • Find and use APIs
  • Understand architecture

Setup:

  • Create developer and other accounts
  • Set up a personal development environment for MediaWiki
    • and …installing extensions in PDE

Code:

  • Follow coding conventions
  • Find and use libraries
  • Write technical documentation

Review:

  • Find someone to review patches

Build & test:

  • Test your code in a shared environment before merge
    • nice list of example uses of Beta Cluster from T215217 :
      • Showcasing new work
      • End-to-end/unit testing of changes in isolation
      • Manual QA, quick iteration on bug fixes
      • Long-term testing of alpha features & services in an integrated environment
      • Test how changes integrate with a production-like environment before release
      • Test the deployment procedure
      • Test performance regressions
      • Test integration of changes with production-like data
      • Test with live traffic

Deploy

  • Use continuous integration
    • especially: finding and resolving errors that caused CI failures
  • Deploy software to Wikimedia production using Kubernetes infrastructure

Monitor:

  • Handle Incidents

Operate:

  • Maintain technical documentation
  • Reply to and triage bugs / feature requests
  • Update as needed
  • Deprecate when desired

For the Developer Portal, we identified user journeys focused on a wider set of developer personas (beyond just MediaWiki developers). As part of preparing to launch the Developer Portal, we worked on improving the landing pages that would be linked to from the portal. The set of wiki pages linked to from the Developer Portal could be considered a collection, or a set of collections, that should be of relatively high quality. They could be useful to use as our baseline for testing metrics and comparing metric outputs for sets of content that haven't received the same amount of curation as the Dev Portal key landing pages.

TODO: Review this historical page that may have a useful breakdown of workflows specific to MediaWiki: https://www.mediawiki.org/wiki/Manual:Contents/To_do

References

edit
  1. 1.0 1.1 1.2 1.3 1.4 Arthur Berger. 2020. Designing an Analytics Approach for Technical Con- tent. In Proceedings of the 38th ACM International Conference on Design of Communication (SIGDOC ’20), October 03, 04, 2020, Denton, TX, USA. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3380851.3416742.
  2. 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 User:KBach-WMF/Collections/Conclusions#Grading criteria
  3. 3.0 3.1 3.2 https://meta.wikimedia.org/wiki/Research:Cloud_Services_Annual_Survey/2022#Documentation
  4. 4.0 4.1 4.2 4.3 4.4 4.5 4.6 Guiseppe Getto, Jack Labriola, and Sheryl Ruszkiewicz. (2019). A practitioner view of content strategy best practices in technical communication: a meta-analysis of the literature. In Proceedings of the 37th ACM International Conference on the Design of Communication (SIGDOC '19). Association for Computing Machinery, New York, NY, USA, Article 9, 1–9. https://doi-org.wikipedialibrary.idm.oclc.org/10.1145/3328020.3353943
  5. 5.00 5.01 5.02 5.03 5.04 5.05 5.06 5.07 5.08 5.09 Ashley R. Hardin. (2023). Conducting Rolling Content Audits with Customer Journeys in Agile and Open Source Work Environments: A user-centered approach to planning and improving content. In The 41st ACM International Conference on Design of Communication (SIGDOC ’23), October 26–28, 2023, Orlando, FL, USA. ACM, New York, NY, USA, 3 pages. https://doi.org/10.1145/3615335.3623045
  6. 6.0 6.1 6.2 6.3 User:Waldyrious/Docs
  7. 7.0 7.1 7.2 7.3 7.4 7.5 https://www.writethedocs.org/guide/writing/docs-principles/#content-should-be
  8. 8.0 8.1 8.2 Developer Portal/History#Key themes from conversations
  9. 9.0 9.1 Wikimedia Technical Conference 2019 Session
  10. 10.0 10.1 10.2 https://meta.wikimedia.org/wiki/Research:Cloud_Services_Annual_Survey/2020
  11. 11.0 11.1 11.2 https://meta.wikimedia.org/wiki/Research:Cloud_Services_Annual_Survey/2021#Documentation
  12. 12.0 12.1 12.2 12.3 User:Pavithraes/Sandbox/Technical documentation prioritization#Prioritizing
  13. 13.0 13.1 13.2 13.3 13.4 13.5 13.6 13.7 User:Zakgreant/MediaWiki Technical Documentation Plan
  14. 14.0 14.1 14.2 14.3 14.4 Strimling, Yoel. (2019). Beyond Accuracy: What Documentation Quality Means to Readers. Technical Communication (Washington). 66. 7-29. Accessed via https://www.researchgate.net/publication/331088095_Beyond_Accuracy_What_Documentation_Quality_Means_to_Readers.
  15. https://www.writethedocs.org/blog/newsletter-july-2024/#how-to-think-about-docs-metrics
  16. Megan Gilhooly. (2020). Product answers: Engineering content for freshness. Intercom (Jan. 2020). https://www.stc.org/intercom/2020/01/product-answers-engineering-content-for-freshness
  17. https://www.mediawiki.org/wiki/Documentation/Toolkit/Collection_audit/Assess
  18. https://meta.wikimedia.org/wiki/Research_and_Decision_Science/Data_glossary
  19. https://meta.wikimedia.org/wiki/Research:Knowledge_Gaps_Index/Measurement/Content#Standard_Quality_Criteria