User:SSethi (WMF)/Sandbox/Eventlogging data access
(Borrowing instructions from Niharika's user page on office wiki)
Hive/Hadoop
editHost bast1002.wikimedia.org
# Direct connection for the bastion host
ProxyCommand none
ControlMaster auto
Host *.wikimedia.org *.wmnet !gerrit.wikimedia.org !git-ssh.wikimedia.org
User enter_your_username
# Everything else goes via bastion acting as a proxy
ProxyCommand ssh -a -W %h:%p bast1002.wikimedia.org
# Do not offer other identities loaded in ssh-agent
IdentitiesOnly yes
IdentityFile ~/.ssh/enter_your_key
ssh stat1007.eqiad.wmnet
hive
use event;
- Find your table and its partition information by using
describe UserFeedback;
- Example queries:
select count(*) as cnt from UserFeedback where year=2019;
select event.page_name, count(*) as cnt from UserFeedback where year=2019 and event.vote="yes" group by event.page_name;
select event.page_name, count(*) as cnt from UserFeedback where year=2019 and event.vote="no" group by event.page_name;
select event.page_name as pagename,count(*) as total,sum(case when event.vote = 'yes' then 1 else 0 end) as yes, sum(case when event.vote = 'no' then 1 else 0 end) as no from userfeedback where year=2019 group by event.page_name;
- To obtain quarterly stats for Toolforge docs in
Help
namespace on Wikitech wiki & Action API docs inAPI
namespace on MediaWiki.org:hive -e "use event; describe UserFeedback; select event.page_name as pagename, count(*) as total,sum(case when event.vote = 'yes' then 1 else 0 end) as yes, sum(case when event.vote = 'no' then 1 else 0 end) as no from userfeedback where month>=6 AND month<9 group by event.page_name;" > /home/srishakatux/temp.tsv
sed 's/\t/,/g' temp.tsv > temp.csv
scp srishakatux@stat1007.eqiad.wmnet:/home/srishakatux/temp.csv /Users/ssethi/Documents/temp.csv