Content translation/Product Definition/analytics/de

This page is a translated version of the page Content translation/Product Definition/analytics and the translation is 2% complete.

ContentTranslation is an extension developed by the Wikimedia Foundation to help multilingual Wikipedia editors create pages. In order to understand the impact of Content Translation, some metrics are defined.

These measurements can be collected using EventLogging and other methods: analyzing Wikipedia dumps, direct queries of backend storage data, etc.

The idea of this document is to have a general idea of what do we need to measure, so that for each feature story we would be able to plan an appropriate EventLogging schema, write appropriate logging functions and have appropriate queries.

Current Metrics

Limn dashboard

Limn dashboards provide a 90 day view of some events.

Limn Dashboard: CX Beta Feature Enablements — The total number of beta feature enablements for CX including auto enroll enablements.
Limn Dashboard: CX Manual Beta Feature Enablements — The total number of beta feature enablements for CX excluding auto enroll enablements.

Special:ContentTranslationStats

Every wiki where CX is deployed has a Special:ContentTranslationStats page, that shows statistics for that wiki and also common statistics for all languages.

For example, for Portuguese Wikipedia: w:pt:Special:ContentTranslationStats.

Published translations — These are translations published as full-fledged articles in the main namespace. The growth of published articles over period of time is visualized as a graph too.
In progress translations — Translations that were started and are saved as in-progress translations.
Number of translators — This is the number of users who published at least a translation. There may be translators who translate to multiple languages. The sum in the table count them separately now.

Other metrics to add or fix, tracked in Phabricator:

Also:

Evolution in time for the above.

More queries and raw data

You can check more queries defined and the resulting data in real-time for different information including deleted pages and pages created per day. There is no visual representations for these, but if you are interested in the raw data, you'll get lots of numbers to look at.

Reports based on queries:

High priorities for product management

How often target and source languages are used. This is not immediately actionable, because the feature won’t be rolled out to all languages from the start, but will be very helpful for us to have a sense of what the main language pairs are. Analysing this on a per language basis allows also to identify the languages that have expanded more their number of articles thanks to the tool.
1. Technically: For the whole cluster, which source and target languages are used most often, per month.
How many users saved at least one draft translation. Take these as a cohort and:
1. Describe their CX activity: number of drafts created through CX.
2. How often is the draft edited by the users before moving to main namespace, and who edits it - the translator or other users.
3. Describe the draft creators’ overall editing activity within same time period.
4. Describe the draft creators’ cross-project behavior.
How many draft translations are created in each language.
How many pages are eventually moved to the main namespace as real articles.
1. What editing activity was done before moving the article to the main namespace.
How many people click the red link.
1. Of the people who click the link, how many people accept the invitation and go on to translate.

General technical principles

Don’t log everything. Log only what’s useful for product metrics.
Try to make several schemas, one per feature or so. This would be easier to query and it may also help avoid changes that would break continuity (a new table is created every time we change a schema version).
Server-side logging doesn't directly reflect user actions, because we'll do a lot of caching and pre-fetching.
We are interested in moving between paragraphs and segments, because it relates to caching.
If we use VE, then changes in the Document Model (in the browser) can be logged easily (they're already stored for undo purposes). This generally includes user selection events. The VE DataModel (DM) contains a representation of the cursor selection.

Tagging

To support the measuring of the metrics above, the articles created by Content Translation will be tagged. Tags will allow to (directly or indirectly) identify the following context information:

The article was created by Content Translation.
The language of the translation.
The source article used for the translation.

Content translation/Product Definition/analytics/de

Contents

Current Metrics

Limn dashboard

Special:ContentTranslationStats

More queries and raw data

High priorities for product management

Other metrics for created articles

Quantity of content

Impact and quality

Evolution in time

General technical principles

Tagging

Other topics to measure - lower priority

Translation center

Translation editing UI

Machine translation “abuse”

Automatic link insertion

Wörterbücher

Entry points

Anderes