Terjemahan konten/Definisi Produk/analitika

This page is a translated version of the page Content translation/Product Definition/analytics and the translation is 19% complete.

Terjemahan Konten adalah ekstensi yang dikembangkan oleh Wikimedia Foundation untuk membantu para penyunting Wikipedia multibahasa membuat halaman. Untuk memahami dampak Terjemahan Konten, beberapa metrik didefinisikan.

Pengukuran berikut bisa dikumpulkan menggunakan EventLogging dan metode-metode lain: menganalisis timbunan Wikipedia, memberik kueri ke data penyimpanan sisi belakang, dan lain-lain.

Ide dari dokumen ini adalah untuk mendapat gambaran umum tentang apa yang kami perlu ukur, sehingga untuk setiap cerita fitur, kami bisa merencanakan skema EventLogging yang sesuai, menulis fungsi pencatatan yang sesuai, dan memiliki kueri yang sesuai.

Core metrics defined for Content Translation

Metrik saat ini

Dasbor Limn

Dasbor Limn menyediakan tampilan beberapa kejadian selama 90 hari.


Every wiki where CX is deployed has a Special:ContentTranslationStats page, that shows statistics for that wiki and also common statistics for all languages.

For example, for Portuguese Wikipedia: w:pt:Special:ContentTranslationStats.

  • Published translations — These are translations published as full-fledged articles in the main namespace. The growth of published articles over period of time is visualized as a graph too.
  • In progress translations — Translations that were started and are saved as in-progress translations.
  • Number of translators — This is the number of users who published at least a translation. There may be translators who translate to multiple languages. The sum in the table count them separately now.

Other metrics to add or fix, tracked in Phabricator:


  • Evolution in time for the above.

Kueri dan data mentah lainnya

You can check more queries defined and the resulting data in real-time for different information including deleted pages and pages created per day. There is no visual representations for these, but if you are interested in the raw data, you'll get lots of numbers to look at.

Reports based on queries:

Prioritas tinggi untuk manajemen produk

  • How often target and source languages are used. This is not immediately actionable, because the feature won’t be rolled out to all languages from the start, but will be very helpful for us to have a sense of what the main language pairs are. Analysing this on a per language basis allows also to identify the languages that have expanded more their number of articles thanks to the tool.
    1. Technically: For the whole cluster, which source and target languages are used most often, per month.
  • How many users saved at least one draft translation. Take these as a cohort and:
    1. Describe their CX activity: number of drafts created through CX.
    2. How often is the draft edited by the users before moving to main namespace, and who edits it - the translator or other users.
    3. Describe the draft creators’ overall editing activity within same time period.
    4. Describe the draft creators’ cross-project behavior.
  • How many draft translations are created in each language.
  • How many pages are eventually moved to the main namespace as real articles.
    1. What editing activity was done before moving the article to the main namespace.
  • How many people click the red link.
    1. Of the people who click the link, how many people accept the invitation and go on to translate.

Metrik lain untuk artikel yang dibuat

The main goal of Content Translation is to increase the content available in all languages. New articles created with the tool will be the main element to observe.

Kualitas konten

  • Number of articles created. Articles created per week, per user, per language.
  • Length of articles. This gives an idea of the kind of articles produced. It can be useful to compare to the length of the original article (e.g., "users translate only 30% of the original article on average").
  • Links to/from articles. Articles that include links to other articles may indicate more complete articles. New articles that become linked are also a sign of being considered usable articles.
  • Time spent in creating a translation (per paragraph). How fast can translators produce content?

Dampak dan kualitas

  • Number of readers on the original article. Does the availability of an article in other languages reduce the readers of the original one?
  • Number of readers for new articles. How many people are accessing the new content.
  • Amount of machine translation.
  • Translations vs. regular edits per user. How many users are only contributing as translators? Are prolific editors adopting translations or users with fewer regular edits?
  • Number of translators (users who have created a translation).
  • Number of articles that translate an existing article (possibly trying to add a new paragraph to an existing translation).

Evolusi seiring berjalannya waktu

Deletion rate. How many articles produced by the tool are deleted by the community. It will be useful to correlate the deleted articles with other metrics (article length, amount of automatic translation, editing expertise of the user).

Prinsip teknis umum

  • Don’t log everything. Log only what’s useful for product metrics.
  • Try to make several schemas, one per feature or so. This would be easier to query and it may also help avoid changes that would break continuity (a new table is created every time we change a schema version).
  • Server-side logging doesn't directly reflect user actions, because we'll do a lot of caching and pre-fetching.
  • We are interested in moving between paragraphs and segments, because it relates to caching.
  • If we use VE, then changes in the Document Model (in the browser) can be logged easily (they're already stored for undo purposes). This generally includes user selection events. The VE DataModel (DM) contains a representation of the cursor selection.

Pemberian tag

To support the measuring of the metrics above, the articles created by Content Translation will be tagged. Tags will allow to (directly or indirectly) identify the following context information:

  • The article was created by Content Translation.
  • The language of the translation.
  • The source article used for the translation.

Topik-topik lain untuk diukur - prioritas rendah

Pusat penerjemahan

  • Dashboard usage: Coming back to complete articles, leaving articles half baked, etc.

Antarmuka penyuntingan terjemahan

  • Buttons and links in translation view
    • Clear translation.
    • Paste source text.
    • "view article" (trivial, but we may find that nobody uses it)
  • Which types of interactive segments are used (suggestions, links, plain segments, templates, etc.) are the users accessing.
  • Time spent on the page: seconds per paragraph, word, article.
    • This can later be used to tell people something like “It will take 20 minutes to translate this article.”
  • It’s easy to track clicks, but we may also want something more complicated, like logging of selecting and deleting everything.

"Penyalahgunaan" terjemahan mesin

  • Does the user’s behavior change after warning about too much machine translation.
  • How many users are shown the MT abuse warning?
  • Which percentage of machine translation did the article contain when the warning was shown? Which percentage of MT did the article contain when it was later published?

Penyisipan pranala otomatis

  • How often each source is automatically chosen for the guessed link target:
    • Wikidata sitelinks
    • Wikidata labels
    • Wikidata aliases
    • manual interlanguage links
    • machine translation
    • Dictionary
  • How often do people choose a different source manually
  • How does the (non-)availability of certain sources affect this choice
  • How often do people remove them (similar to measuring content change above)


  • How often is each dictionary used for each language

Titik masuk

  • How often is each entry point used?
  • How often do people create articles from scratch as opposed to translating using CX?