Continuous integration meetings/2015-04-28/Minutes

#wikimedia-office: CI weekly meetingEdit

Meeting started by hashar at 14:00:00 UTC. The full logs are available below.

MeetBot was down and the minutes have been manually crafted.

Meeting summaryEdit

Meeting ended at 15:10:00 UTC.

Action itemsEdit

  • Antoine to publish on wiki past week and current meeting minutes
  • Antoine to copy past his IRC log to craft the minutes

IRC logEdit

Times are CEST / UTC+2

[16:00] <hashar> #startmeeting CI weekly meeting
[16:00] <Krinkle> o/
[16:01] <hashar> zeljkof: Krinkle legoktm addshore jzerebecki :]
[16:01] <addshore> \o
[16:01] <hashar> so I have just realized I haven't formatted previous meeting minutes :/
[16:01] <zeljkof-meeting> hashar: and the google calendar event says the meeting is in releng channel
[16:02] <hashar> #link https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-04-21-14.01.html Past week minutes
[16:03] <hashar> #action Antoine to publish on wiki past week and current meeting minutes
[16:03] <hashar> 
[16:03] <jzerebecki> o/
[16:03] <hashar> there was barely any actions iirc beside https://phabricator.wikimedia.org/T96629 Convert pool from a few large slaves (4X) to more smaller slaves (1X)
[16:03] <zeljkof-meeting> hashar: I have updated the meeting in calendar to point to this channel
[16:04] <hashar> zeljkof-meeting: great thanks!
[16:04] <Krinkle> Yeah
[16:04] <hashar> so it seems to be quite easy to have new instance images in openstack. Andrew got us one very easily via https://phabricator.wikimedia.org/T96706
[16:05] <Krinkle> Andrew created an instance type, but needs to be re-done because we asked for the wrong one
[16:05] <Krinkle> 30GB is fine, but by default 20GB goes to /, so only 10GB is in /mnt
[16:05] <hashar> I am adding it to the releng weekly meeting in a couple hours
[16:05] <Krinkle> See my latest comment on blocking task https://phabricator.wikimedia.org/T96706
[16:05] <Krinkle> Thanks
[16:06] <hashar> easy :]
[16:06] <hashar> should we do some triage so?
[16:06] <Krinkle> Yes!
[16:06] <hashar> oh no
[16:07] <hashar> #topic next meeting
[16:07] <hashar> I am unavailable next tuesday 
[16:07] <hashar> attending some organization board meeting
[16:08] <hashar> either the meeting can be held without me, I am sure Krinkle can lead it just fine 
[16:08] <hashar> or we move it to another day (Monday or Wednesday)
[16:08] <Krinkle> I'd prefer we skip the meeting
[16:08] <Krinkle> Or someone else lead it
[16:09] <hashar> so we skip it 
[16:10] <hashar> #agreed to skip next meeting due to Antoine unavailability. Next one will be Tuesday May 12th  14:00 UTC.
[16:10] <hashar> #topic Triage
[16:10] <Krinkle> https://phabricator.wikimedia.org/project/board/401/
[16:10] <hashar> #link https://phabricator.wikimedia.org/project/board/401/query/open/
[16:10] <hashar> for some reason closed tasks shows up by default for me 
[16:11] <Krinkle> Hm.. even on board/ ?
[16:11] <Krinkle> For me it only does so on tag/
[16:11] <hashar> ohhh
[16:11] <hashar> yeah indeed
[16:11] <zeljkof-meeting> hashar: you mean filter is set to all tasks instead of open tasks?
[16:11] <hashar> zeljkof-meeting: yeah that is rather annoying
[16:12] <hashar> but I have bookmarks :]
[16:12] <hashar> so
[16:12] <hashar> there is a bunch of patches for zuul-cloner that have been waiting for a long time
[16:12] <Krinkle> Before we go into untriaged one, I'd like to sync up on a few existing tasks that we are in progress without activity
[16:12] <hashar> ah yeah
[16:13] <hashar> go ahead Krinkle !
[16:13] <Krinkle> #link https://phabricator.wikimedia.org/T91396 Have jenkins jobs logrotate their build history
[16:13] <Krinkle> It's mostly finished, can we wrap it up?
[16:13] <hashar> gotta verify that all jobs have the logrotate
[16:13] <legoktm> o/
[16:13] <hashar> most of them do for sure
[16:13] <hashar> legoktm: good morning :]
[16:14] <Krinkle> hashar: Beware that grepping on gallium may also include jobs that no longer "exist".
[16:14] <Krinkle> If you encounter one of those, deleting them in Jenkins UI is also good 
[16:14] <hashar> yeah which is another problem
[16:14] <hashar> so while doing the grep I cleaned up some jobs that are no more in jjb
[16:14] <hashar> and of course, some jobs are still used  / triggered by zuul but not in JJB :(((
[16:15] <Krinkle> Yeah, see https://phabricator.wikimedia.org/T91410#1197507
[16:15] <Krinkle> I deleted most garbage in Jenkins. Still a few left.
[16:15] <hashar> I have posted an update at https://phabricator.wikimedia.org/T91396#1241614
[16:16] <Krinkle> cool.
[16:16] <hashar> will finish the task tomorrow
[16:16] <hashar> I have added a note to my calendar
[16:16] <hashar> and fill tasks or port  jobs to JJB
[16:17] <hashar> next!
[16:17] <Krinkle> ALrighty
[16:18] <Krinkle> #agreed Antoine to finish tomorrow. The unmanaged tasks will be tracked under T91396.
[16:18] <Krinkle> #link https://phabricator.wikimedia.org/T97106 Zuul-cloner should use hard links when fetching from cache-dir
[16:18] <Krinkle> This is a nested dependency for "Jenkins must wipe workspace"
[16:18] <hashar> I have seen somewhere you gave it a try on one of the instance
[16:18] <Krinkle> Yeah
[16:19] <hashar> ah on https://phabricator.wikimedia.org/T97098
[16:19] <hashar> so many tasks
[16:19] <Krinkle> Workspace wipe means we need to do faster clones, which means we need git-cache.
[16:19] <Krinkle> The same applies to VM isolation of course, so I'm hoping you can do some of this later on.
[16:20] <hashar> so it is all about reusing https://review.openstack.org/#/c/117626/4 right ?
[16:20] <Krinkle> My availability for CI is shrinking as most of this is no longer explicitly beneficial to VE.
[16:20] <Krinkle> I'm still interested of course, so I'll help where I can, but not as expected/obligatory maintenance.
[16:20] <hashar> understood
[16:20] <hashar> all the zuul cherry picking has been blocked since December 
[16:20] <Krinkle> Yeah, CR came from upstream. Though it needs 2 CR+2 in openstack to merge.
[16:21] <hashar> merely because using pip  was a pain in the ass
[16:21] <hashar> with the .deb package that is much easier
[16:21] <hashar> so we have
[16:21] <Krinkle> So, I implemened cache-no-hardlinks
[16:21] <hashar> https://phabricator.wikimedia.org/T97098 Update jobs to use zuul-cloner with git cache via hard links
[16:21] <hashar> https://phabricator.wikimedia.org/T97106 Zuul-cloner should use hard links when fetching from cache-dir 
[16:21] <hashar> seems the later is redundant
[16:21] <Krinkle> I also implemented git-cache-update script (see pending commit in integration/jenkins.git)
[16:21] <Krinkle> hashar: No, one is to implement arguemnt in zuul. The other is all our jobs.
[16:22] <Krinkle> Oh, right, you're saying it will become default
[16:22] <Krinkle> Right right
[16:22] <Krinkle> That wasn't always the cache since the patch was initially was for cache-force-hardlinks
[16:22] <Krinkle> we inversed it
[16:22] <Krinkle> I'll change it to "Update jobs to use zuul-cloner with git cache"
[16:23] <hashar> yeah it is much better this way
[16:23] <hashar> so the default is whatever git clone would do by default
[16:23] <Krinkle> We do need to configure cache-dir, which is blocked on deploying the git-cache script, and converting the instances to a smaller 1x executor, and running the cache update script from a periodic job
[16:24] <hashar> soI can assign https://phabricator.wikimedia.org/T97106 to myself
[16:24] <hashar> grab https://review.openstack.org/#/c/117626/4 on the .deb packages
[16:24] <hashar> and upgrade zuul
[16:25] <Krinkle> I hope the dependency tree around this entire thing is clear. Feel free to ask anything 
[16:25] <hashar> yeah it is good to me
[16:25] <hashar> so 
[16:25] <hashar> zuul need the patch
[16:26] <hashar> then the job have to be updated to pass git cache dir  to zuul-cloner
[16:26] <hashar> then we need to setup the git cache
[16:26] <hashar> I am taking https://phabricator.wikimedia.org/T97106 
[16:26] <Krinkle> I'd set up git cache first to avoid zuul-cloner interacting with an incomplete cache.
[16:27] <hashar> #agreed Antoine to bump Zuul version to incorporate the zuul-cloner patch that hard links from git cache https://phabricator.wikimedia.org/T97106
[16:27] <Krinkle> Ah, cool. We can backport yeah
[16:27] <hashar> and we can probably come up with a partial cache
[16:27] <hashar> with just the basic repos / most heavy repos
[16:27] <hashar> next! :]
[16:29] <hashar> there is an in progress task by legoktm "Convert operations/mediawiki-config to use composer for phpunit and php linting"  https://phabricator.wikimedia.org/T85947
[16:29] <hashar> oh it is blocked
[16:29] <legoktm> it's blocked on https://phabricator.wikimedia.org/T92605 "Come up with non sucky solution for running "composer test" on repos that have vendor/ checked in"
[16:30] <legoktm> the problem right now is that we commit the "vendor" repo that composer generates with all the real dependencies
[16:30] <hashar> so can we run   composer test 
[16:30] <legoktm> so when you run "composer update" to get phpunit, phpcs, etc. vendor/ becomes dirty
[16:30] <hashar> without the composer install or composer update step ?
[16:30] <Krinkle> partial cache is fine, but I mean race condition corruption. E.g. if you pass cache-dir before it exists and then run the script it will interact with half-cache. And also to update the cache dir (once a day or once a week) we need to convert instances to 1x ci1.medium first as tht's the only way to execute a job in a way that nothing else runs at that time.
[16:31] <Krinkle> Okay, next 
[16:31] <hashar> Krinkle: I am not too worried by the git cache system :]
[16:31] <hashar> legoktm: imho the aim is to test what will end up in prod
[16:31] <hashar> so using  whatever is shipped in the /vendor/ dir seems appropriate
[16:31] <hashar> hence, skip  composer update / install
[16:31] <legoktm> no, because we aren't committing phpunit. Just the "require" dependencies, not "require-dev"
[16:32] <hashar> arhgrhg
[16:32] <hashar> and I imagine you can't install/update just the require-dev rights?
[16:32] <legoktm> you can, but it will make local testing a pain
[16:32] <legoktm> because composer update will make your repo dirty
[16:32] <legoktm> so it ends up being
[16:32] <legoktm> "composer update --dev"
[16:32] <legoktm> "composer test"
[16:32] <legoktm> "composer update --no-dev"
[16:32] <legoktm> "git commit..."
[16:33] <hashar> can we git add  the require-dev ?
[16:33] <hashar> that will solve the problem imho
[16:33] <hashar> but not sure we want to deploy them :/
[16:33] <legoktm> it would negatively affect performance
[16:34] <marktraceur> ...is meetbot not in this channel anymore? 
[16:34] <hashar> marktraceur: ahah thanks for noticing :]
[16:34] <Krinkle> O_O
[16:34] <Krinkle> Wops
[16:35] <hashar> #action Antoine to copy past his IRC log to craft the minutes
[16:35] <hashar> legoktm: what do you mean by performance? 
[16:35] <hashar> is it related to having to sync a lot more materials?
[16:35] <legoktm> I'm mainly thinking of adding a few hundred extra entries to the autoloader that it has to go through and never use
[16:36] <hashar> ah yeah good point :/
[16:36] <hashar> though if it is class name -> file  array
[16:36] <Krinkle> legoktm: We wouldn't deploy that though, right?
[16:36] <hashar> the look up is done as a hash so that is quite fast
[16:37] <hashar> if only there was  /vendor/ and /vendor-dev/
[16:37] <hashar> we could .gitignore  the dev dependencies
[16:37] <legoktm> but they'll still be in the generated composer autoloader
[16:38] <hashar> you could have two autoloaders
[16:38] <hashar> the general one would skip the dev autoloader if it is missing
[16:39] <hashar> maybe_include( '/vendor-dev/autoloader.php' );
[16:39] <Krinkle> Is this for performance in jenkins or prod?
[16:39] <legoktm> Krinkle: well he was suggesting committing and deploying the extra libraries so I was thinking about prod
[16:40] <Krinkle> Why?
[16:40] <Krinkle> You mean PHPUnit?
[16:40] <legoktm> yes
[16:40] <Krinkle> Why would we commit that to git, we just fetch in Jenkins, no?
[16:40] <legoktm> fetching in jenkins is fine and already works, it just sucks for developers trying to test locally right now
[16:41] <Krinkle> Why?
[16:41] <Krinkle> We always do composer-update/composer-test locally
[16:41] <legoktm> because you have to run "composer update --dev", "composer test", "composer update --no-dev" (to get rid of the dev dependencies) and then "git commit"
[16:41] <hashar> yeah locally a dev needs the --dev components
[16:41] <Krinkle> Because the repo hard codes vendor/
[16:41] <hashar> that will pollute the vendor dir
[16:42] <legoktm> you have to fetch dev deps, which makes vendor/ dirty, run tests, then get rid of them so vendor/ is clean again
[16:42] <Krinkle> One solution would be to move out vendor/ in a separate repo, like for mw
[16:42] <hashar> legoktm: at this point, maybe you want to summarize on the task https://phabricator.wikimedia.org/T92605  and reach out wikitech-l to figure out a good solution?
[16:43] <Krinkle> Then developers locally use plain composer
[16:43] <legoktm> yeah, I think using submodules/separate repos is a good idea
[16:43] <hashar> Krinkle model is what is used for parsoid
[16:43] <hashar> maybe it can be generalized
[16:43] <Krinkle> and for Jenkins we clone vendor/ and clobber it with --dev
[16:43] <hashar> but then we need to run tests against the source repo with composer update
[16:43] <legoktm> ok, I can do that
[16:44] <hashar> and the /deploy/ tests by NOT using composer update but relying solely on /deploy/
[16:44] <hashar> that is a pain in the ass though
[16:44] <hashar> cause the code has to end up in the source repo
[16:44] <hashar> then one has to submodule update the deploy repo
[16:44] <Krinkle> By the way, we can't re-use mediawiki-vendor probably. I talked about this with bd808 and the problem is basically that mediawiki-config also runs in mediawiki-core in prod.
[16:44] <Krinkle> This is already a problem in disguise in prod right now with CDB
[16:45] <Krinkle> the one in mediawiki-vendor is shadowed by the mwconfig one
[16:45] <hashar> that is two operations for deployment and up to four tests being run
[16:45] <Krinkle> just FYI 
[16:45] <hashar> ah
[16:45] <hashar> damn dependencies
[16:45] <legoktm> hashar: you can't run tests just with /deploy/ because it won't have phpunit committed.
[16:45] <Krinkle> and we cannot require that all mediawiki versions use the same dependencies. newer wmf branch can be newer.
[16:46] <hashar> mediawiki-config should use the  vendor/ directories from our mediawiki-core /vendor dirs 
[16:47] <hashar> I am not sure how we can handle it
[16:47] <hashar> might need some serious work to figure out all the use cases
[16:48] <legoktm> the problem is that PHP's namespace is global :P
[16:49] <Krinkle> We cannot use mediawiki's one because then we could never change any dependency in master without backporting to all branches. E.g. if I change dependency in master, and dont backport, it would silently apply to previous branch when the next branch is cut. And we wouldn't be able to deploy config change without also updating mediawiki to latest master.
[16:49] <legoktm> if we ever have to update cdb, it's going to be a very sticky challenge. But I don't see that happening soon...
[16:50] <Krinkle> We'll probably need a separate vendor/ and document the ones that are shadowed (like CDB)
[16:50] <Krinkle> legoktm: btw, we could just keep it in mediawiki-config repo and add the entire thing to gitignore
[16:50] <Krinkle> On the rare time you have to update vendor in config, git add -f and ensure you clean dev dependencies first.
[16:50] <Krinkle> similar to when you'd have to update the vendor repo.
[16:51] <Krinkle> Just an idea 
[16:51] <Krinkle> #link https://phabricator.wikimedia.org/T92871 Parsoid patches don't update Beta Cluster automatically
[16:51] <Krinkle> (oldest untriaged)
[16:52] <hashar> hmm
[16:52] <hashar>     beta-parsoid-update-eqiad Change has been deployed on the EQIAD beta cluster in 2s
[16:52] <hashar> on the change https://gerrit.wikimedia.org/r/#/c/194452/
[16:53] <hashar> the job console log is gone though
[16:53] <legoktm> (thanks for discussing, I have a good idea on how to move forward)
[16:54] <hashar> legoktm: the dependency problem is a though one. We might want to list potential such as:  how would I bump cdb version
[16:54] <hashar> legoktm: then find out a nice / clean way to push the cdb version bump on the cluster granted it is provided in both wmf versions and in mediawiki-config :/
[16:54] <hashar> legoktm: or we could come up with a global vendor dir :]
[16:55] <legoktm> mhm
[16:56] <hashar> Krinkle: that parsoid bug I have no idea
[16:56] <Krinkle> stalled on their decision?
[16:57] <Krinkle> I've moved it to config
[16:57] <hashar> the job might be wrong 
[16:57] <hashar> looking at https://integration.wikimedia.org/ci/job/beta-parsoid-update-eqiad/961/console
[16:57] <hashar> it seems to refresh both mediawiki/services/parsoid and mediawiki/services/parsoid/deploy
[16:58] <hashar> the later having the parsoid code as a submodue
[16:58] <hashar> and I am not sure what code the daemon is running
[17:01] <hashar> Krinkle: I am going to look at that parsoid issue
[17:01] <hashar> anything else ?
[17:02] <Krinkle> hashar: Ah, yeah. Zuul update.
[17:02] <Krinkle> Do we know of any breaking  changes?
[17:02] <Krinkle> Maybe we should update our Zuul to latest upstream before we add our patches
[17:02] <hashar> what do you mean by zuul update?
[17:03] <Krinkle> separate deployment
[17:03] <hashar> ah
[17:03] <hashar> I looked at our pending patches
[17:03] <hashar> they are for zuul-cloner which barely has changed for the last few months
[17:03] <Krinkle> yes
[17:03] <hashar> so the patches can be cherry picked on our current stalled version
[17:03] <Krinkle> but I mean there are bugs in zuul
[17:04] <Krinkle> and one patch in zuul-cloner did get merged (git.clean)
[17:04] <hashar> I definitely want to upgrade Zuul to catch up with upstream but I haven't looked at the history of changes
[17:04] <Krinkle> git.reset
[17:04] <Krinkle> Yeah, not sure it's clean upgrade.
[17:04] <hashar> yeah that git.clean stuff should happly cleanly
[17:06] <jzerebecki> i triaged https://phabricator.wikimedia.org/T97040 and https://phabricator.wikimedia.org/T96014
[17:06] <hashar> Krinkle: I will get the pending patches we have and add them to our .deb packages
[17:06] <hashar> Krinkle: and one day figure start catching up with upstream changes
[17:06] <Krinkle> hashar: OK. Don't worry about my upstream patches for Status dashboard though.
[17:06] <hashar> legoktm: ha thanks
[17:07] <hashar> jzerebecki: ah thanks
[17:07] <Krinkle> but zuul-cloner/git-reset, status/last-modified, and zuul-cloner/no-hardlinks would be good to get landed
[17:07] <hashar> jzerebecki: the yamllint job has to be phased out, I have filled a tracking bug for that https://phabricator.wikimedia.org/T95890  and filled a bunch of bugs to ask developers to lint their yaml files via their test suite
[17:08] <hashar> Krinkle: yeah they are all on my radar :]
[17:08] <hashar> Krinkle: will give them at try tomorrow afternoon
[17:08] <Krinkle> Nice 
[17:09] <hashar> ok the end
[17:09] <hashar> wish my luck in formatting the above irc log :]
[17:09] <Krinkle> That's all folks. The end. https://www.youtube.com/watch?v=0FHEeG_uq5Y
[17:10] <legoktm> o/