Universal Language Selector/Compact Language Links/metrics

This page describes the scripts and queries for collecting data about Interlanguage links, especially in the context of Compact Language Links. For the collected data, see the page Universal Language Selector/Compact Language Links/metrics/data.

edit

For getting information one day before, run:

$ ssh stat1002
$ ./all_projects.sh

For a particular date, run (note: no leading zeros in the date):

$ ./all_projects.sh 2016 7 7

The output will be in the files with names such as prev_wikipedia_2016_7_7.out and curr_wikipedia_2016_7_7.out, where "prev" is for the source language and "curr" is for target language.

Copy the results from the .out file to a text editor, and then copy them from the text editor to the relevant column in the spreadsheet.

Pageviews data

edit

Updating daily data

edit

THIS IS ON THE WAY TO DEPRECATION, AND MOVING TO QUERIES, SEE BELOW

Get the pageviews data for the projects that are measured:

Copy the results to the relevant cells in the spreadsheet.

Adding a new language for calculating percentage of clicks out of views

edit
  1. Find the right place, alphabetically, for the language to appear.
  2. Select all the rows for the language before which you want to add another one, including an empty row after it.
  3. Copy (Ctrl+C/⌘ Command+C).
  4. Right-click on the row numbers and click "Add rows above: 7".
  5. Select the added rows. Paste (Ctrl+P/⌘ Command+P).
  6. Change the language code in the first pasted row.
  7. Go to https://tools.wmflabs.org/siteviews/
  8. Select a range of initial dates that you want to backfill.
  9. Type the project name. The need chart will be shown.
  10. Click the CSV button and download the file. Open the CSV file in your desktop spreadsheet software.
  11. In your desktop spreadsheet software, select the numbers for the relevant dates and Copy (Ctrl+C/⌘ Command+C).
  12. Go back to the Google spreadsheet. Paste (Ctrl+P/⌘ Command+P) in some empty space and again Copy (Ctrl+C/⌘ Command+C).
  13. Right-click on the first cell in the "views" row for the new language. Paste special -> Transpose.
  14. Update the URLs for pageviews stats in the section on pageviews above.

Query for getting pageviews

edit

From Phab T142505

SELECT
    year,
    month,
    day,
    CONCAT(year, "-", LPAD(month, 2, "0"), "-", LPAD(day, 2, "0")) AS date,
    SUM(view_count) AS views 
FROM
    wmf.pageview_hourly
WHERE
    year = 2016 AND
    month = 8 AND
    day = 16 AND
    project = 'ru.wikipedia' AND
    agent_type = 'user' AND
    access_method = 'desktop' AND
    NOT ( -- See https://phabricator.wikimedia.org/T141506
        user_agent_map['browser_family'] = 'Chrome' AND
        user_agent_map['browser_major'] = '41' AND
        user_agent_map['os_family'] LIKE 'Windows%' AND
        page_title = 'Заглавная_страница'
    )
GROUP BY
    year,
    month,
    day
ORDER BY
    year,
    month,
    day
LIMIT 1000;

The clicks query

edit

These queries run from the all_projects.sh script. They are here only for reference.

-- Assuming that your username is "kartik"...

-- Create your database (this only needs to be done once!)
CREATE DATABASE kartik;

-- Go to your database.
use kartik;

-- Create table of cross wiki navigation events.
-- Modify time parameters for your needs!
-- This takes around 5 minutes per day.
DROP TABLE IF EXISTS cross_wiki_navigation;
CREATE TABLE cross_wiki_navigation AS
 SELECT year,
        month,
        day,
        prev,
        curr,
        COUNT(*) AS n

   FROM (SELECT day,
                month,
                year,
                REGEXP_EXTRACT(parse_url(referer,'HOST'), '^([a-z-]{2,}).wikipedia.org', 1) AS prev,
                normalized_host.project AS curr

           FROM wmf.webrequest
                -- select a relevant timespan to query over
          WHERE year = ${hiveconf:yesterday_year}
                AND month = ${hiveconf:yesterday_month}
                AND day = ${hiveconf:yesterday_day}
                AND webrequest_source = 'text'
                AND is_pageview
                -- only consider wikipedia article requests from users
                AND agent_type = 'user'
                AND normalized_host.project_class = 'wikipedia'
                -- only consider wikipedia article referers (this is an approximation)
                AND parse_url(referer,'HOST') LIKE '%wikipedia.org'
                -- exclude .m.wikipedia.org and .zero.wikipedia.org
                AND parse_url(referer,'HOST') NOT LIKE '%.%.wikipedia.org'
                AND parse_url(referer,'PATH') LIKE '/wiki/%'

        ) log_with_parsed_referer
  WHERE prev != curr
  GROUP BY
        year,
        month,
        day,
        prev,
        curr
;

-- Interlanguage links clicks by source language.
-- This takes about a minute per day.
SELECT
    prev,
    sum(n)
FROM 
    cross_wiki_navigation
GROUP BY
    prev
ORDER BY
    prev
LIMIT
    1000;

-- Interlanguage links clicks by target language.
-- This takes about a minute per day.
SELECT
    curr,
    sum(n)
FROM 
    cross_wiki_navigation
GROUP BY
    curr
ORDER BY
    curr
LIMIT
    1000;