Topic on Talk:Wikimedia Release Engineering Team/GitLab

ESanders (WMF) (talkcontribs)

Here are some features of Gerrit our team use regularly that we would very much like to see in any future system:

  • Being able to easily create a stack of dependent commits, each one representing a single commit in the final tree, and being able to modify those commits, and review changes between those modifications.
  • Being able to easily re-order that stack using an interactive rebase
  • Creating cross-repo dependencies using the "Depends-On" tag or similar.
Aron Manning (talkcontribs)

> Being able to easily create a stack of dependent commits, each one representing a single commit in the final tree, and being able to modify those commits, and review changes between those modifications

Yes, please! I miss that feature so much, especially when CI flatlines under the load of the 10 patches I submit in a chain. :-)

Cscott (talkcontribs)

+1. Our current CI handles quite complicated dependencies, ie: introduce new API in core, patch dozens of extensions to use this new API (depends-on: first patch), then deprecate and remove the original API (depends-on: all the deployed extension patches).

My own working style on "feature branches" also involves incrementally factoring out and rebasing necessary-but-independent fixes to other places in the code as earlier patches in the stack. Github's review process has always made viewing the history of and reviewing such patch stacks very awkward.

TCipriani (WMF) (talkcontribs)

Depends-On exists (in some form) in GitLab, unfortunately it's in the enterprise edition (EE). On the Features page we talked about re-implementation via CI.

EBernhardson (WMF) (talkcontribs)

Mostly a +1 to ESanders comment, I extensively use the ability to create a stack of commits that are separately reviewed, and re-ordering that stack is not so uncommon. My experience attempting this sort of workflow in GitHub (no GitLab experience, but it seems pretty similar) was pretty bad, but that could be user ierror.

GLavagetto (WMF) (talkcontribs)

+1 to all of the above

SRE tend to break down important changes in subsequent patchsets to better control application of a change (so for instance - first on one server, then on the canary pool, then everywhere).

If we lose the ability to do so with ease, and to rebase/resubmit those stacks of changes, our workflow will need to change radically compared to what we have today.

We also need the ability to apply such stacks of changes to a working tree with ease (for instance for the puppet compiler).

Reedy (talkcontribs)

>On the Features page we talked about re-implementation via CI.

This somewhat irks me.

I understand not wanting to necessarily pay GitLab for these features, but developing them in the first place and then debugging and maintaining them is not free either. I doubt they're going to take them back upstream, so we're left maintaining them long term. We might find some other people to contribute, but maybe we won't.

It seems odd to switch tools to make things some things easier, then because we lose some fairly major features, we then reinvent the wheel. Aren't these custom changes and the "wikimedia only" ways of doing such things part of the current issues with CI as is?

Cscott (talkcontribs)

I believe "marge-bot" was the canonical replacement for the paid features, but it doesn't seem like marge-bot actually handles our use cases particularly well.

DLynch (WMF) (talkcontribs)

I was wondering whether we could use the "merge request dependencies" thing in the premium version to fake up the stack-of-commits model by having the a bunch of merge requests on a single project, which all depend on each other. However, it explicitly doesn't support "a “deep”, or “nested” graph of dependencies", so even if we had it, it might not actually work for that.

That said, assuming that merge requests function approximately the same as github merge requests, we can get much of what Ed wants... if we agree that feature-branches can have changes force-pushed to them. That gets us the stack and the reordering, but not the individual review/merge of commits.

I also agree that developing our own set of complicated behaviors on top of our new tool does seem like we're just jumping head-first back into the issues we're apparently leaving gerrit to avoid.

BBearnes (WMF) (talkcontribs)

We might find some other people to contribute, but maybe we won't.

I think this is a good place to mention (or reiterate) that there's a set of projects with values and goals broadly aligned to ours using GitLab's Community Edition.  We've reached out to organizations including Debian, KDE,  Gnome, and about their implementations, and generally received a very encouraging response.  There's shared technical work being done in this space and (hopefully) an opportunity to collaborate with a broader community.

Roan Kattouw (WMF) (talkcontribs)

I think it's worth mentioning here that Github PRs technically do support multi-commit branches, but that's used to reflect changes made in response to review, and the commits are squashed into one before being merged. This means any solution that addresses the "stack of multiple commits" use case also needs to address the "amendments in response to code review" use case. You also need to be able to rebase a change onto an update version of the change it depends on.

I suppose it's possible that some of these things are already supported, and support can be added for the others, but I'm somewhat skeptical that this will work as well as it does in Gerrit.

BBearnes (WMF) (talkcontribs)

I think it's worth mentioning here that Github PRs technically do support multi-commit branches

To elaborate on this a bit, multi-commit branches are the long-time default on GitHub (or I should say in the PR/MR model), and while it's my impression there's been more of a trend towards squash-and-merge, it's not abnormal in many projects to develop and land feature branches with multiple commits - be that in order of work done & feedback received, or rewritten into logical units of work. Preferences about the structure of commit history vary considerably in the wild. Generally those preferences can be modeled within a workflow where branches are first-order units of work (GitHub, GitLab), just as they can in one where iteratively re-written patches are the core abstraction (Gerrit).

GitLab's UI lets the submitter or merger check a box to squash commits and edit the commit message. I think that a single commit is the appropriate result of most merges from a branch, but there are probably exceptions, including some of the simple cases for a chain of dependent commits. (As a builtin abstraction for grouping changes, branches do have real affordances here that, if not precisely lacking in Gerrit, are at least located differently.)

I note that GitLab merge review comments appear to persist through a force-pushed rewrite of history on a branch, so that's an option for responding to feedback where rewritten commit history is desired, even as "add more commits in response to feedback and squash at the end" is probably a more natural way of working for many problems.

This means any solution that addresses the "stack of multiple commits" use case also needs to address the "amendments in response to code review" use case. You also need to be able to rebase a change onto an update version of the change it depends on.

This raises some important subtleties. My current mental model of this is that all/most of it's possible but some of it requires further thought to articulate a good workflow, and yeah, some things get undeniably clunkier even as others get easier. To sorta echo BDavis elsewhere, this thread is probably a good starting point for a "cookbook" style collection of workflow problems & solutions.

Roan Kattouw (WMF) (talkcontribs)

One of the subtleties this raises in particular is that you might have a chain of dependent commits that you intend to be merged as-is without squashing, but then you also need to make some tweaks to some of the commits in response to code review. Then you have a branch where some of the commits need to be squashed, but not all of them. This may well be resolvable by manually squashing the ones you want to squash, then merging the result, but the tooling would need to support that without causing too many problems (for example, it'd be really annoying if you needed to create a new change submission that's not/poorly linked to the original, or if diffs that show you what changed between iterations of the change are not available).

TCipriani (WMF) (talkcontribs)

It does seem GitLab supports this sort of workflow by allowing either the submitter or the reviewer to choose to squash merge-requests on a per-merge-request basis.

That is, for some merge-requests there may be a logical order of commits and for others that may be less critical. The unit of change for the GitLab merge is the merge request as opposed to the patchset/changelist.

Tgr (talkcontribs)

I'd also be curious about how well GitLab UIs scale. The most complex project I have been involved in was probably AuthManager, where the main commit went through nearly two hundred revisions, had hundreds of code review comment threads, and several dozen comments depended on it, often across repositories. Gerrit didn't handle that super well, but it was functional. Github, at the time, would have been crippling. (I understand they have added some features aimed at large code reviews since then.) Can we try recreating something like that in GitLab and see how it looks?

Tgr (talkcontribs)

Another feature I rely on a lot is the ability to compare different patchsets (to see how a comment got resolved, for example). At least on the Github interface, I don't think either force pushing or multi-commit branches really support that (although of course you can construct the diff URL manually).

KHarlan (WMF) (talkcontribs)

Yeah, I think in the GitHub/GitLab paradigm it is preferable to have multiple commits in the branch as feedback is addressed, in part so it's easier for reviewers to see how things got fixed. The problem is that if the idea is to squash when merging, there is more work added to the reviewer to write a new commit message that incorporates all of the messages in the branch.

Roan Kattouw (WMF) (talkcontribs)

In particular, Gerrit lets you version and review the commit message itself across multiple iterations of the commit, which is really useful both for correcting mistakes in the commit message and for the use case @KHarlan (WMF) mentions where the commit message evolves along with the contents of the change.

DLynch (WMF) (talkcontribs)

I went and did some quick experimenting on gitlab to see whether branching from existing merge request branches would work for the stack-of-commits case.

It sort of works! You have to strictly stick to merging the stack from the bottom up. I.e. if you have branches `master > A > B > C` you have to merge C, B, A in order. If you do that, it updates the merge request based on the branch you just merged into to include the commit you just merged (and its merge commit, which may get a bit noisy).


  • if you deviate from that ordering, it becomes impossible to automatically merge any of the requests below it since the branch they're merging to no longer exists, and manually doing the merge from the command line loses the association with the merge request
  • this ordering requirement means that the practical experience is the opposite of the gerrit flow, where you can at any time decide to commit the top commit in a chain
  • lots of merge-commit noise compared to the gerrit flow -- I couldn't see a way to do a fast-forward merge from the web UI, though it may exist somewhere
TCipriani (WMF) (talkcontribs)

The fast-forward merge option is per-repo and is on the settings page (see screenshot at:

While the UI is different in Gerrit vs GitLab, the underlying mechanisms here are the same—that is, in Gerrit you are merging a ref (e.g., refs/changes/yy/xxxyy/1) into a branch (e.g., main) and in GitLab you merge a ref (refs/heads/fix/T1) — which is itself is a "branch" — into a branch (e.g., main).

The creation of those refs in Gerrit is automatic and we rarely think about it. Pushing via e.g., git push origin HEAD:refs/for/main causes Gerrit to make changesets for every commit from origin/main..HEAD. In GitLab making each ref has to happen locally rather than automatically. This can be accomplished via something like:

while read changeset; do
    git push origin "$changeset":refs/heads/thcipriani/changes/"$changeset"
done < <(git log --format=%h @{u}..HEAD)

to create a branch for each change. It is noteworthy that this is fighting the "feature branch" strategy the drives merge-requests. That is, if all your changes are related (i.e., a feature branch) then, in a GitLab world, these become a single merge-request.

Merging from the bottom up isn't necessary for chains of merge-requests as far as I can tell. For example, let's say:

  • you have 3 changes: A, B, C
  • you create 3 merge requests (one for each change): mA, mB, and mC
  • mA's merge target branch is main
  • To get a small review diff mB's merge target is mA
  • mC's merge target is mB

From here, the logical thing might be to merge from the bottom up: merge mC -> mB; mB to mA; mA to main—this works; however, it's not the only thing you could do.

You could merge mA to main (which deletes the mA branch by default), and then change your merge-target for mB from mA to main. This is, in essence, a "rebase".

This has been a long-winded way of saying: all git workflows are possible in git.

In many ways Gerrit aligns more closely with how git works (branches weren't always things) and the abstraction it provides makes some workflows easier. The merge-request workflow is a different abstraction than Gerrit's patchset-based workflow and it requires users to make explicit what was once an implementation detail of the tooling: namely that there are refs that represent a set of changes that we're trying to merge to a branch. This will have an effect on how we write changes.

If GitLab's abstractions makes a certain workflows more onerous then we should document those workflows to see if we can work with GitLab/the GitLab community to make them easier.

DLynch (WMF) (talkcontribs)

> The fast-forward merge option is per-repo and is on the settings page

Per-repo for that setting makes it impossible to replicate the gerrit flow, I'd think? Because outright replicating it requires that we fast-forward the merge-into-another-merge, but then have a merge commit for the final merge into master. (I don't actually have an opinion about whether we should have that final merge commit or not, I'm just looking at differences.)

> This has been a long-winded way of saying: all git workflows are possible in git.

I do agree with this! However, I was trying to examine it from a standpoint of what Gitlab's built-in tools would allow without needing to drop down to the command line or manually intervening with the merge request settings after every merge.

EBernhardson (WMF) (talkcontribs)

I think about this more in terms of the goals, and why i submit a 5 patch chain instead of a 1k+ line patch to be merged. These patches are between 100 and 200 lines each. They are small and easy to review and think about. They do individual small things, and the sum of all the patches achieves my goal. I often write these initially as perhaps one or two patches, and then refactor the commits into individual patches for ease of review. Each patch in the chain is focused, in my experience this greatly simplifies review

The simplest example I can think of is a two patch chain, the first patch refactors what exists. This should have no functionality changes. This patch can stand on its own as an improvement even without the followup patch adding functionality (typically, at least). Especially when it comes to puppet i want to deploy that refactor without the additional functionality, as further proof that the refactor made no changes to the outputs, only to the structure. The patch adding functionality should be merged and deployed separately. This is in part about isolating the failure domain, so we know which bit of what broke things.

I don't need the gerrit workflow, but i do need a workflow that allows me to write 1k+ lines of code, and not force someone else to have to decide in an all-or-none approach if this is mergable. I need a way to write pieces of code that are related and build on top of the previous patch, but deploy them as individual units to isolate the causes of failure.

Roan Kattouw (WMF) (talkcontribs)

To add onto that: whether you end up merging one big change or several smaller ones also impacts what the history looks like. That's not just a cosmetic thing, it has real-life impacts on how useful tools like blame, bisect and revert are. In Erik's example, if the refactor change and the new functionality change are separate, and stay separate in the merged history, we'll be able to identify which of them caused a particular bug, and we won't need to do baby-with-the-bathwater reverts as often.

ArielGlenn (talkcontribs)

+1 to being able to keep the chain of small patches separate; when I'm looking back through puppet or MediaWiki core or other repos, small commits are easier to eyeball, search through the commit history, etc to find the source of a specific change or narrow down when a particular behavior changed. And especially in the case of commit messages, I look at the related patchset to be sure the code does what the change says, or to understand what is being said; that's much more onerous with a thousand line squashed commit.

Cscott (talkcontribs)

There are CI implications as well. In this chain-of-small-patches approach, *each patch in the chain* is expected to pass CI. In the feature-branch approach, usually CI is only run on the end result. That's fine if you're going to squash all the commits at the end, but causes issues for git-bisect if you don't squash commits. (Take a look at the bisect alias here for the complicated workaround to make git-bisect work as we're used to.)

KHarlan (WMF) (talkcontribs)

From what I've seen in GitHub, each commit in a branch is run through CI. I assume the same could be done (is already the default?) for GitLab as well.

The way I've been thinking about the PR model as it applies to our current gerrit workflows is that each branch would be the "patchset chain" that we have in gerrit. For example, looking at this patch:

  • `WelcomeSurveyHooks: Convert to HookHandler` would be the first commit in the merge request for "Add welcome survey language question"
  • `WelcomeSurvey: Add languages question` would be the second commit in the merge request for "Add welcome survey language question"

Then this patch which is in a different repository would have a merge request ("Use MobileFrontend overlay for language question") that depends on the merge request for "Add welcome survey language question"

In other words, rather than thinking of each patch in our existing chains as separate merge requests, I think it might be a bit more intuitive if they are separate units (commits) within a single merge request.

ESanders (WMF) (talkcontribs)

> From what I've seen in GitHub, each commit in a branch is run through CI

It is possibly configurable, but by default only the latest commit at the time of pushing gets tested. See this example where my first push was two commits, and so the first commit ("Fix documentation whitespace") was never tested.

KHarlan (WMF) (talkcontribs)

Ah, it seems like it's not possible in GitLab, not yet anyway; there are a few issues and some WIP merge requests around it.

Greg (WMF) (talkcontribs)
Greg (WMF) (talkcontribs)

Those following along here, please see the comment from Brennen on the primary consultation talk page (sorry for forking it...) on how to test out the workflow in our test GitLab instance. The consultation WG would greatly appreciate a few people to give it a go and giving feedback on how to best utilize GitLab workflow options to meet our needs. Thanks!

Reply to "Feature requests"