Integration: Merge requests and patchsets on Talk:GitLab/2020 consultation

One aspect of a migration to GitLab that has been touched on in other discussions is that of integration.

Integration is the process of combining a new piece of code into a mainline codebase. Our mainline codebase under Gerrit and, presumably, under GitLab will be to a mainline branch: a shared sequence of commits that record changes.

The mechanism by which code is integrated under Gerrit is a patchset. A patchset is a single commit that represents the difference between itself and a target branch, typically the target branch is the mainline branch. The mechanisms of combining patchsets with mainline vary by repo but a patchset may either be merged (creating a merge commit) or fast-forwarded (no merge-commit) meaning the patchset is the only commit added to the target branch. In addition to a git commit SHA1, each patchset has a "Change-Id" which is an ID that is searchable in Gerrit and points to a given patchset. Patchsets may be chained. When one commit is the parent of another commit pushing those commits to Gerrit creates a chain of patchsets. The child patchsets may not be merged independently without the parent patchsets having merged. The mechanism to update a patchset is to push a new commit (or chain of commits) with the same Change-Ids as those patchsets you're wishing to update to the special refs/for/[target branch] reference.

The mechanism by which code is integrated under GitLab is the merge request. Each merge request consists of a source branch and a destination branch. The source branch contains 1 or more commits not contained in the destination branch along with a description of the intent of the merge request. The mechanism of combining merge requests is a combination of the repository settings and the merge-requests settings. A merge request may either be squashed (creating a single commit) or each commit may be left seperate. Additionally, a merge-request may be combined by merging (creating a merge-commit) or fast-forwarded: adding the commits in the merge request to the tip of the mainline branch without creating a merge commit. The mechanism to update a merge request is to push a new commit to the source branch or to --force push to the source branch. Generally, force pushing a source branch is not recommended as review discussion may become confusing.

The branching pattern most frequently used with merge-requests is feature branching; that is, putting all work for a particular feature into a branch and integrating that branch with the mainline branch when the feature is complete.

The branching pattern most frequently used with patchsets is what Martin Fowler has termed continuous integration with reviewed commits. That is, there is no expectation that a patchset implements a full feature before integrating it with the mainline branch, only that it is healthy and has had its commits reviewed.

The branching pattern is not necessarily tightly coupled with the tooling, for example, a merge-request could be created with a single commit that itself does not implement an entire feature: this is a merge-request that is not using feature branching. Each tool does, however, encourage using their respective branching mechanisms.

There are various aspects to explore here:

Workflow/branching pattern changes
Review changes
Integration frequency changes
Necessary tooling changes

Reply 00:27, 9 September 2020 4 years ago

TCipriani (WMF) (talkcontribs)

Per working group discussion I've added a few definitions from this topic to the main page.

Reply 17:04, 21 September 2020 4 years ago

GLederrey (WMF) (talkcontribs)

For context, the previous discussion in Topic:Vpbonspnaaafjdwq is probably relevant here. It describes some of the current use cases about strings of patches and potential ways of having similar workflows in Gitlab (but it looks like currently there isn't an obvious way to implement a similar workflow).

Reply 12:31, 14 September 2020 4 years ago

Adamw (talkcontribs)

The way I've seen this work in Gitlab and Github is that CI tooling will automatically check the head of the branch. You can choose to test the head with or without a rebase. If three patches are submitted at once, only the final patch is tested. If a new patch is submitted as a test is running, that test is canceled and the new patch is tested.

Testing can also be configured to gate the branch merge to master.

The norm is to squash all patches on a branch anyway, but TCipriani's question highlights that we might need to *enforce* squashing, otherwise we could end up with non-passing commits and potentially someone might roll back to an inconsistent point. But maybe this is already handled by a simple merge commit, which makes it clear that the intermediate patches are *not* on master.

Reply 09:23, 18 September 2020 4 years ago

BBearnes (WMF) (talkcontribs)

The norm is to squash all patches on a branch anyway, but TCipriani's question highlights that we might need to *enforce* squashing, otherwise we could end up with non-passing commits and potentially someone might roll back to an inconsistent point. But maybe this is already handled by a simple merge commit, which makes it clear that the intermediate patches are *not* on master.

This emphasizes a really important point here: Should every commit to the master / main branch represent a known-good, deployable state (as far as we're capable of achieving that)? We do our best to achieve that currently, at least on repos like core, which does seem like it militates in favor of squashing commits by default.

Reply 20:40, 21 September 2020 4 years ago

EBernhardson (WMF) (talkcontribs)

To me squashing depends on what those patches were. If the patch chain is the following, it should probably be squashed:

Primary feature implementation -> typo in Foo.php -> correct comments in Bar

If the patch chain is the following, it needs to not be squashed because this is a useful separation point for future bisecting:

Refactor to allow implementation -> Implement feature

Reply 20:54, 21 September 2020 4 years ago

Jdforrester (WMF) (talkcontribs)

Unfortunately, in my experience, in the real world in systems where force-rewrite of open PRs isn't available (most FLOSS GitHub and GitLab repos), people end up mushing multiple feature commits and fixups into the same chain.

A 'simple' example, with committer shown in square brackets ahead of each commit:

[A] First feature > [B] typo fix > [B] addition of test suite > [C] Second, dependent feature > [A] failing expansion and modification of test suite based on feedback from the second feature > [C] fix of first feature

Squashing this stack is pretty bad, throwing away the separation of the two main features, and the authorship blame. Not squashing this stack is also pretty bad, littering the history with nonsense commits, making git bisect vastly harder, and creating toxic, test-failing branch points.

Theoretically you can dump the MR, git rebase -i the stack to make history "sensible" with just two commits, and then re-push it as pair of a MRs (one with the first feature commit, the other with the first and second), the second of which screams "DO NOT MERGE UNTIL YOU MERGE MR X FIRST!" manually, but this loses this history of the discussion on what's best to do, still loses the kudos/blame of some of the contributors, and is an epic extra piece of work.

Of course, GitLab could be extended (either by us or upstream) to add features to manage this (turning the 'squash' button into a complex form to let the merge select arbitrary squash/fixup/rebase actions on a per-commit basis), but that's a huge undertaking, taking GitLab quite far away from the simple branch model it's built around so upstream may well not be keen, and said code has to be written and supported by someone.

This workflow is one that I personally do up to a few times a day, pretty much every day. It's the core of MW's development model. I know that a few areas of our codebase don't use this model and don't have the complexity of multi-feature inter-relation development, but they're the simple exceptions, and it feels like we're focussing on toy development rather than the main stream of our work in all the analysis. It's not an "oh well" to lose it, it's going to be pretty seriously disruptive.

Reply 11:19, 22 September 2020 4 years ago

EBernhardson (WMF) (talkcontribs)

I haven't run into the issue of force-rewrite on open PRs being disabled, but indeed that would make my current experiments with finding a reasonable workflow completely useless. If the only option in a PR is to continually add more code that will be squashed into a single patch, I worry the development history and general experience of performing code review is going to suffer for anything of complexity.

Reply 15:02, 22 September 2020 4 years ago

Adamw (talkcontribs)

If the patch chain is the following, it needs to not be squashed because this is a useful separation point for future bisecting: Refactor to allow implementation -> Implement feature

Good point, in this case with a squash workflow the feature would have to be split into two branches.

Reply 07:54, 22 September 2020 4 years ago

EBernhardson (WMF) (talkcontribs)

How does that work though? As far as I can tell gitlab has no affordance to split a PR into two branches. If branch A is the refactor, and branch B is the new feature, then as far as gitlab is concerned a PR on B is a PR for A+B and it can be merged without consideration of the A PR.

Reply Edited 14:54, 22 September 2020 4 years ago

TCipriani (WMF) (talkcontribs)

There is a feature in the premium version for merge request dependencies that is needed here.

I'm not entirely satisfied with any other mechanisms (aside from merge request dependencies) for splitting merge-requests and having them depend on one another. The "smartest" thing I could think to do is to have dependent merge requests target other merge-request branches. For example, I split !4 into !4 and !5. In !5 I targeted the master branch and in !4 I targeted the work/thcipriani/beautiful-soup-dependency branch (the branch from !5). After merging !4 the merge showed up in !5 rather than in master where it could cause issues. I suppose that's desirable in terms of behavior, but there are a few problems with this:

History becomes messy. Maybe this could have been avoided had I used some other options in merging.
It's non-obvious that it's not merged to master
I wasn't prevented from merging the dependent patchset, it merely mitigated any risk of merging it

With the general advice on getting speedy code review being to split your patchsets it'd be nice to have this be a more supported path. It's noteworthy that there are many open issues about creating a merge-request splitting tool.

Reply 17:19, 22 September 2020 4 years ago

Adamw (talkcontribs)

We're just talking about the gitlab UI, I think? From the commandline, let's say you have patches (1, 2, 3, 4) that make up a branch "A", and you want to split (1, 2) into its own merge request. To do that, check out patch 2 then "git branch <new name>" or "git checkout -b", and push that.

Agreed that stacking merge requests can get tricky--but you can usually get the desired effect by carefully choosing the merge target for your PR. If I have branches A and B stacked on each other, then A will be merged to master but B will be "merged" to A. This prevents the UI from showing all of the branch A commits as if they were part of B.

Reply 18:25, 22 September 2020 4 years ago

AKosiaris (WMF) (talkcontribs)

Let me add a workflow that SRE uses in gerrit and is pertinent I believe to the integration topic.

An SRE pushes a topic branch in the puppet repo. Every single one of the commits in the topic branch needs to be merged and deployed individually, after having been reviewed (hopefully) and CI has +2ed it. Rebasing might be needed but it's also expected in the current workflow. The reason for that is that every single one of those commits has state changing consequences for at least part of the server fleet and the SRE in question is expected to merge, "deploy" it and perhaps even trigger multiple puppet runs (alternatively they can also wait for the full 30mins that currently puppet changes to reliably be distributed to the entire fleet).

The most recent example I can think of is https://gerrit.wikimedia.org/r/q/topic:%22623773%22+(status:open%20OR%20status:merged).

How will SRE have to adapt that workflow for gitlab? Create a separate MR per change? Using a single MR clearly doesn't cut it (right?), but on the other hand having to go through the process of manually creating 4 or 5 MRs for something that is automatic in Gerrit isn't great either.

Reply Edited 20:25, 22 September 2020 4 years ago

TCipriani (WMF) (talkcontribs)

I made a concrete example of this on our gitlab-test instance

Of Note

I used merge request labels in the place of topics
This is a series of patchsets, but they have no semantic relationship to one-another
My interaction with this repo was purely through the git client and no other programs

From my side the steps were:

Create my work locally as a series of commits
Use push options to make a merge-request for each patchset

This looked like:

$ echo '5' > COUNTDOWN
$ git commit -a -m 'Start countdown (1/5)'
$ echo '4' > COUNTDOWN
$ git commit -a -m 'Decrement countdown (2/5)'
...
$ git push \
  -o merge_request.create \
  -o merge_request.target=production \
  -o merge_request.remove_source_branch \
  -o merge_request.title="COUNTDOWN (1/5)" \
  -o merge_request.label='T1234' \
  gitlab-test \
  HEAD~4:work/thcipriani/T1234

Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 4 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 327 bytes | 327.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote:
remote: ========================================================================
remote:
remote:     A test instance of GitLab for the Wikimedia technical community.
remote:   Data may disappear. Consider everything here potentially public, and
remote:                     do not push sensitive material.
remote:
remote: ========================================================================
remote:
remote: View merge request for work/thcipriani/T1234:
remote:   https://gitlab-test.wmcloud.org/thcipriani/operations-puppet/-/merge_requests/1
remote:
To gitlab-test.wmcloud.org:thcipriani/operations-puppet.git
 * [new branch]            HEAD~4 -> work/thcipriani/T1234

As is already mentioned, this could be wrapped in a friendlier interface.

Reply 13:53, 23 September 2020 4 years ago

AKosiaris (WMF) (talkcontribs)

I 've iterated a bit on your take. Mostly a for loop to go through all the changes. What I got

is at https://gitlab-test.wmcloud.org/akosiarisgroup/simplegroupstuff/-/merge_requests?label_name%5B%5D=T42

$ for i in 5 4 3 2 1 ; do \

git push -o merge_request.create \

-o merge_request.target=main \

-o merge_request.remove_source_branch \

-o merge_request.title="We are leaving together ($i/5)" \

-o merge_request.description="And still we stand tall ($i/5)" \

-o merge_request.label="T42" \

origin HEAD~$((i - 1)):refs/heads/Europe$((i - 1)) ; done

Couple of notes:

The gitlab label for this specific use case seems to supplant the gerrit topic branch adequately
CI is run on every MR, which is what we want
While some merge_request git push options are static, some others like title and description aren't. A wrapper tool will need to extract them from the git commit messages I guess. That adds a bit to the complexity of the tool that needs to be written, but it's arguably doable and probably maintainable. It will be however akin to git-review (does anyone cringe already?) albeit only for those that require this workflow

The big issue I see is the fact we don't got dependent MRs in this case. So any of the later MRs can be merged at any point in time by mistake causing the expected mayhem. And it seems like this is a Paid feature and not a Community edition feature per Wikimedia Release Engineering Team/GitLab/Features, which isn't a great sign. The notes in that table say "Important for our workflows, but can be re-implemented as a CI job.". Not sure what that means? We 'll create CI that checks something, but what exactly? A "Depends-on"? That's opt-in (as in the user must actively write it), it will probably not safeguard us much.

Reply 14:51, 23 September 2020 4 years ago

Nikerabbit (talkcontribs)

I would imagine that doing a wrapper that emulates `git review` behavior in a way that creates a separate merge request for each commit wouldn't be too hard. The real issue is lack of nice UI in GitLab to automatically rebase merge requests on top of the target branch.

Reply 06:48, 23 September 2020 4 years ago

Adamw (talkcontribs)

Gitlab will show you when a merge request is going to conflict with master, and a successful merge includes rebase. Is there a reason we need to explicitly rebase the merge request before merging, or maybe that's just a habit carried over from our Gerrit workflow?

Reply 06:54, 23 September 2020 4 years ago

Nikerabbit (talkcontribs)

If the merge requests depend on the previous one, at least the merge base needs to be updated.

If there are conflicts during a rebase, I would like the test pipeline to run again on the rebased merge request before it is merged to the master.

Reply Edited by Nemo bis 08:40, 23 September 2020 4 years ago

DLynch (WMF) (talkcontribs)

I just want to chime in to agree that losing / making-more-complicated the ability to separate out commits into logical units of work seems like a bad thing for our ongoing code health. Workflow forces pushing us towards squashing only semi-related commits together when we merge looks unambiguously bad.

I'd be less concerned if we had the premium merge dependencies, though it looks like they lack some of the convenience features of gerrit's topic chains.

Writing a `git review`-like tool seems like a maybe-viable compromise... but if we're writing custom tooling and will expect any contributor who makes more than an utterly trivial change to be using it, are we gaining so much from a migration any more?

Reply 15:15, 28 September 2020 4 years ago

Cscott (talkcontribs)

Given the importance of this stack-of-patches workflow to many current developers, I would have liked to see a more concrete plan, including timeline, for implementing this in WMF's gitlab migration. Especially as the required gitlab features for this are not expected to be included in the Community Edition of gitlab WMF is planning to use? More clarify would be helpful.

Reply 17:01, 26 October 2020 4 years ago

TCipriani (WMF) (talkcontribs)

In terms of timeline the GitLab/Roadmap speaks to the chronology of events. Tying the chronology to real world times: the "Utility project migration" heading in that roadmap is where we hope to be in 8 months. I've called out the dependent patchsets explicitly in that step.

We've raised this with GitLab as well. So far they've provided some workarounds that are a bit clunky. I'd encourage developers that care about this feature to poke the upstream ticket as a signal about the feature's importance: https://gitlab.com/gitlab-org/gitlab/-/issues/251227

Reply 19:44, 26 October 2020 4 years ago

Topic on Talk:GitLab/2020 consultation

Integration: Merge requests and patchsets