Analytics/Server Admin Log/Archive/2015
15:23 ottomata: killing oozie legacy_tsv job 0102159-150605005438095-oozie-oozi-B to restart it without mobile, 5xx-mobile and zero outputs
03:14 ottomata: restarted eventlogging
14:40 ottomata: restarting eventlogging to see if it is ok after enabling firewall rules on kafka1014
15:51 joal: Change replication factor to 2 in cassandra per_article_flat keyspace
15:47 ottomata: deploying aqs
18:24 ottomata: deploying aqs
10:35 joal: Gzipped already archived pageview files
10:34 joal: restarted pageview job to archive gzipped files
10:34 joal: refinery deployed
19:16 joal: Downsizing cassandra replication from 3 to to 2 on per_article_flat keyspace
19:07 joal: Restart load job (based on IMPORTED flag)
15:48 joal: Deploying refinery
15:40 joal: deploying refinery-source v0.0.22
19:06 ottomata: deploying aqs
18:24 joal: deploying refinery
16:46 joal: Releasing refinery-source v0.0.21
10:34 joal: manual aggregator launch after small bug correction
18:42 joal: refine bundle, pageview_hourly and projectview_hourly coord restarted
18:41 joal: refinery deployed on HDFS
14:33 joal: truncating "local_group_default_T_pageviews_per_article".data on aqs
09:58 joal: Restart cassandra on aqs1001
20:24 ottomata: deploying aqs
09:51 joal: restart cassandra on aqs1003
22:53 milimetric: deployed EventLogging and tried to backfill data lost on 2015.10.14 but failed
18:24 joal: Stopped per article loading in cassandra
13:39 ottomata: deploying aqs
10:12 joal: restart cassandra on aqs1002
18:35 ottomata: restarting eventlogging with change to parse schema names out of errored events
20:38 joal: restarted cassandra on aqs100[1,2,3]
12:17 joal: Refinery deploy needed before restart --> Deploying
12:12 joal: Restarting daily and monthly mobile unique coordinators with new patch
12:12 joal: Rerunning daily mobile unique jobs for days 2015-08-[03,04,11,12,12,14,17], 2015-09-16
12:10 joal: Stopped daily and monthly mobile unique coordinators
15:22 ottomata: restarting lagging eventlogging mysql consumer
19:26 ottomata: releasing refinery 0.20
15:19 ottomata: moved camus property files out of refinery repository and into puppet. Camus properties now live on an27 at /etc/camus.d, and camus log files are in /var/log/camus
14:54 joal: Cassandra restarted on aqs1003
09:15 joal: Restart cassandra on aqs1002
17:38 joal: Backfilling load from hadoop to cassandra from beginning of october
16:32 joal: Started cassandra load jobs from 2015-10-01
16:27 valhallasw`cloud: testing again
16:13 valhallasw`cloud: test
10:51 joal: cluster back to normql state. Some errors are still not explained, need to be carefull.
14:56 joal: backfilling various load jobs having failed at earlier stages than check_sequence_statistics
13:03 joal: Errors on cluster, dome refine jobs have failed, investigating.
18:20 ottomata: does this log work?
22:09 qchris: starting HDFS balance for unhealty node analytics1016.eqiad.wmnet with healty nodes analytics1037.eqiad.wmnet,analytics1040.eqiad.wmnet
02:10 qchris: Ran kafka leader re-election as analytics1021 dropped out of it's partition leader role.
01:32 qchris: name nodes died with error "Java heap space" and did not come back up. Bumping heap allowed to resurrect them (See task T88871 ).
23:22 qchris: Manual failover of Hadoop namenode from analytics1001 to analytics1002, as analytics1001 had Heap space errors
07:49 qchris: Manual failover of Hadoop namenode from analytics1002 to analytics1001, as analytics1002 had Heap space errors
20:21 ottomata: deployed refinery 0.0.4
19:37 ottomata: released refinery 0.0.4
21:53 qchris: Marked raw text webrequest partition for 2015-01-24T00/1H ok (See task T87545 )
22:46 qchris: Marked raw upload webrequest partition for 2015-01-16T12/1H ok (The partition only needed deduping)
22:23 qchris: Marked raw upload webrequest partition for 2015-01-16T01/1H ok (The partition only needed deduping)
22:11 qchris: Marked raw upload webrequest partition for 2015-01-15T17/1H ok (The partition only needed deduping)
22:04 qchris: Marked raw text webrequest partition for 2015-01-15T15/1H ok (The partition only needed deduping)
22:01 qchris: Marked raw mobile webrequest partition for 2015-01-16T01/1H ok (The partition only needed deduping)
08:25 qchris: Ran kafka leader re-election to bring analytics1021 back into the set of leaders
12:15 qchris: Marked raw mobile+text webrequest partitions for 2015-01-05T17/1H ok (See task T85918 )
12:06 qchris: Marked raw mobile and upload webrequest partition for 2015-01-03T10/1H ok (See task T85758 )
21:21 qchris: Ran kafka leader re-election to bring analytics1021 back into the set of leaders
21:07 qchris: Marked raw bits, text, and upload webrequest partition for 2014-12-11T14/1H ok (See task T85712 )
19:05 qchris: Marked raw text+upload webrequest partitions for 2014-12-26T06/1H ok (See task T85709 )
15:51 qchris: Marked raw text webrequest partition for 2014-12-11T20/1H ok (See task T85699 )
12:39 qchris: Marked raw mobile webrequest partition for 2014-12-29T17/1H ok (See task T85695 )
11:21 qchris: Marked raw text webrequest partition for 2014-12-30T20/1H ok (See task T85692 )
20:26 qchris: Marked raw webrequest partitions for 2014-12-10T14/2H ok (See task T85675 )