Talk:GitLab/2020 consultation

About this board

Integration: Merge requests and patchsets

21 comments • 11:11, 23 September 2021 3 years ago

One aspect of a migration to GitLab that has been touched on in other discussions is that of integration.

Integration is the process of combining a new piece of code into a mainline codebase. Our mainline codebase under Gerrit and, presumably, under GitLab will be to a mainline branch: a shared sequence of commits that record changes.

The mechanism by which code is integrated under Gerrit is a patchset. A patchset is a single commit that represents the difference between itself and a target branch, typically the target branch is the mainline branch. The mechanisms of combining patchsets with mainline vary by repo but a patchset may either be merged (creating a merge commit) or fast-forwarded (no merge-commit) meaning the patchset is the only commit added to the target branch. In addition to a git commit SHA1, each patchset has a "Change-Id" which is an ID that is searchable in Gerrit and points to a given patchset. Patchsets may be chained. When one commit is the parent of another commit pushing those commits to Gerrit creates a chain of patchsets. The child patchsets may not be merged independently without the parent patchsets having merged. The mechanism to update a patchset is to push a new commit (or chain of commits) with the same Change-Ids as those patchsets you're wishing to update to the special refs/for/[target branch] reference.

The mechanism by which code is integrated under GitLab is the merge request. Each merge request consists of a source branch and a destination branch. The source branch contains 1 or more commits not contained in the destination branch along with a description of the intent of the merge request. The mechanism of combining merge requests is a combination of the repository settings and the merge-requests settings. A merge request may either be squashed (creating a single commit) or each commit may be left seperate. Additionally, a merge-request may be combined by merging (creating a merge-commit) or fast-forwarded: adding the commits in the merge request to the tip of the mainline branch without creating a merge commit. The mechanism to update a merge request is to push a new commit to the source branch or to --force push to the source branch. Generally, force pushing a source branch is not recommended as review discussion may become confusing.

The branching pattern most frequently used with merge-requests is feature branching; that is, putting all work for a particular feature into a branch and integrating that branch with the mainline branch when the feature is complete.

The branching pattern most frequently used with patchsets is what Martin Fowler has termed continuous integration with reviewed commits. That is, there is no expectation that a patchset implements a full feature before integrating it with the mainline branch, only that it is healthy and has had its commits reviewed.

The branching pattern is not necessarily tightly coupled with the tooling, for example, a merge-request could be created with a single commit that itself does not implement an entire feature: this is a merge-request that is not using feature branching. Each tool does, however, encourage using their respective branching mechanisms.

There are various aspects to explore here:

Workflow/branching pattern changes
Review changes
Integration frequency changes
Necessary tooling changes

Reply 00:27, 9 September 2020 4 years ago

TCipriani (WMF) (talkcontribs)

Per working group discussion I've added a few definitions from this topic to the main page.

Reply 17:04, 21 September 2020 4 years ago

GLederrey (WMF) (talkcontribs)

For context, the previous discussion in Topic:Vpbonspnaaafjdwq is probably relevant here. It describes some of the current use cases about strings of patches and potential ways of having similar workflows in Gitlab (but it looks like currently there isn't an obvious way to implement a similar workflow).

Reply 12:31, 14 September 2020 4 years ago

Adamw (talkcontribs)

The way I've seen this work in Gitlab and Github is that CI tooling will automatically check the head of the branch. You can choose to test the head with or without a rebase. If three patches are submitted at once, only the final patch is tested. If a new patch is submitted as a test is running, that test is canceled and the new patch is tested.

Testing can also be configured to gate the branch merge to master.

The norm is to squash all patches on a branch anyway, but TCipriani's question highlights that we might need to *enforce* squashing, otherwise we could end up with non-passing commits and potentially someone might roll back to an inconsistent point. But maybe this is already handled by a simple merge commit, which makes it clear that the intermediate patches are *not* on master.

Reply 09:23, 18 September 2020 4 years ago

BBearnes (WMF) (talkcontribs)

The norm is to squash all patches on a branch anyway, but TCipriani's question highlights that we might need to *enforce* squashing, otherwise we could end up with non-passing commits and potentially someone might roll back to an inconsistent point. But maybe this is already handled by a simple merge commit, which makes it clear that the intermediate patches are *not* on master.

This emphasizes a really important point here: Should every commit to the master / main branch represent a known-good, deployable state (as far as we're capable of achieving that)? We do our best to achieve that currently, at least on repos like core, which does seem like it militates in favor of squashing commits by default.

Reply 20:40, 21 September 2020 4 years ago

EBernhardson (WMF) (talkcontribs)

To me squashing depends on what those patches were. If the patch chain is the following, it should probably be squashed:

Primary feature implementation -> typo in Foo.php -> correct comments in Bar

If the patch chain is the following, it needs to not be squashed because this is a useful separation point for future bisecting:

Refactor to allow implementation -> Implement feature

Reply 20:54, 21 September 2020 4 years ago

Jdforrester (WMF) (talkcontribs)

Unfortunately, in my experience, in the real world in systems where force-rewrite of open PRs isn't available (most FLOSS GitHub and GitLab repos), people end up mushing multiple feature commits and fixups into the same chain.

A 'simple' example, with committer shown in square brackets ahead of each commit:

[A] First feature > [B] typo fix > [B] addition of test suite > [C] Second, dependent feature > [A] failing expansion and modification of test suite based on feedback from the second feature > [C] fix of first feature

Squashing this stack is pretty bad, throwing away the separation of the two main features, and the authorship blame. Not squashing this stack is also pretty bad, littering the history with nonsense commits, making git bisect vastly harder, and creating toxic, test-failing branch points.

Theoretically you can dump the MR, git rebase -i the stack to make history "sensible" with just two commits, and then re-push it as pair of a MRs (one with the first feature commit, the other with the first and second), the second of which screams "DO NOT MERGE UNTIL YOU MERGE MR X FIRST!" manually, but this loses this history of the discussion on what's best to do, still loses the kudos/blame of some of the contributors, and is an epic extra piece of work.

Of course, GitLab could be extended (either by us or upstream) to add features to manage this (turning the 'squash' button into a complex form to let the merge select arbitrary squash/fixup/rebase actions on a per-commit basis), but that's a huge undertaking, taking GitLab quite far away from the simple branch model it's built around so upstream may well not be keen, and said code has to be written and supported by someone.

This workflow is one that I personally do up to a few times a day, pretty much every day. It's the core of MW's development model. I know that a few areas of our codebase don't use this model and don't have the complexity of multi-feature inter-relation development, but they're the simple exceptions, and it feels like we're focussing on toy development rather than the main stream of our work in all the analysis. It's not an "oh well" to lose it, it's going to be pretty seriously disruptive.

Reply 11:19, 22 September 2020 4 years ago

EBernhardson (WMF) (talkcontribs)

I haven't run into the issue of force-rewrite on open PRs being disabled, but indeed that would make my current experiments with finding a reasonable workflow completely useless. If the only option in a PR is to continually add more code that will be squashed into a single patch, I worry the development history and general experience of performing code review is going to suffer for anything of complexity.

Reply 15:02, 22 September 2020 4 years ago

Adamw (talkcontribs)

If the patch chain is the following, it needs to not be squashed because this is a useful separation point for future bisecting: Refactor to allow implementation -> Implement feature

Good point, in this case with a squash workflow the feature would have to be split into two branches.

Reply 07:54, 22 September 2020 4 years ago

EBernhardson (WMF) (talkcontribs)

How does that work though? As far as I can tell gitlab has no affordance to split a PR into two branches. If branch A is the refactor, and branch B is the new feature, then as far as gitlab is concerned a PR on B is a PR for A+B and it can be merged without consideration of the A PR.

Reply Edited 14:54, 22 September 2020 4 years ago

TCipriani (WMF) (talkcontribs)

There is a feature in the premium version for merge request dependencies that is needed here.

I'm not entirely satisfied with any other mechanisms (aside from merge request dependencies) for splitting merge-requests and having them depend on one another. The "smartest" thing I could think to do is to have dependent merge requests target other merge-request branches. For example, I split !4 into !4 and !5. In !5 I targeted the master branch and in !4 I targeted the work/thcipriani/beautiful-soup-dependency branch (the branch from !5). After merging !4 the merge showed up in !5 rather than in master where it could cause issues. I suppose that's desirable in terms of behavior, but there are a few problems with this:

History becomes messy. Maybe this could have been avoided had I used some other options in merging.
It's non-obvious that it's not merged to master
I wasn't prevented from merging the dependent patchset, it merely mitigated any risk of merging it

With the general advice on getting speedy code review being to split your patchsets it'd be nice to have this be a more supported path. It's noteworthy that there are many open issues about creating a merge-request splitting tool.

Reply 17:19, 22 September 2020 4 years ago

Adamw (talkcontribs)

We're just talking about the gitlab UI, I think? From the commandline, let's say you have patches (1, 2, 3, 4) that make up a branch "A", and you want to split (1, 2) into its own merge request. To do that, check out patch 2 then "git branch <new name>" or "git checkout -b", and push that.

Agreed that stacking merge requests can get tricky--but you can usually get the desired effect by carefully choosing the merge target for your PR. If I have branches A and B stacked on each other, then A will be merged to master but B will be "merged" to A. This prevents the UI from showing all of the branch A commits as if they were part of B.

Reply 18:25, 22 September 2020 4 years ago

AKosiaris (WMF) (talkcontribs)

Let me add a workflow that SRE uses in gerrit and is pertinent I believe to the integration topic.

An SRE pushes a topic branch in the puppet repo. Every single one of the commits in the topic branch needs to be merged and deployed individually, after having been reviewed (hopefully) and CI has +2ed it. Rebasing might be needed but it's also expected in the current workflow. The reason for that is that every single one of those commits has state changing consequences for at least part of the server fleet and the SRE in question is expected to merge, "deploy" it and perhaps even trigger multiple puppet runs (alternatively they can also wait for the full 30mins that currently puppet changes to reliably be distributed to the entire fleet).

The most recent example I can think of is https://gerrit.wikimedia.org/r/q/topic:%22623773%22+(status:open%20OR%20status:merged).

How will SRE have to adapt that workflow for gitlab? Create a separate MR per change? Using a single MR clearly doesn't cut it (right?), but on the other hand having to go through the process of manually creating 4 or 5 MRs for something that is automatic in Gerrit isn't great either.

Reply Edited 20:25, 22 September 2020 4 years ago

TCipriani (WMF) (talkcontribs)

I made a concrete example of this on our gitlab-test instance

Of Note

I used merge request labels in the place of topics
This is a series of patchsets, but they have no semantic relationship to one-another
My interaction with this repo was purely through the git client and no other programs

From my side the steps were:

Create my work locally as a series of commits
Use push options to make a merge-request for each patchset

This looked like:

$ echo '5' > COUNTDOWN
$ git commit -a -m 'Start countdown (1/5)'
$ echo '4' > COUNTDOWN
$ git commit -a -m 'Decrement countdown (2/5)'
...
$ git push \
  -o merge_request.create \
  -o merge_request.target=production \
  -o merge_request.remove_source_branch \
  -o merge_request.title="COUNTDOWN (1/5)" \
  -o merge_request.label='T1234' \
  gitlab-test \
  HEAD~4:work/thcipriani/T1234

Enumerating objects: 4, done.
Counting objects: 100% (4/4), done.
Delta compression using up to 4 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 327 bytes | 327.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote:
remote: ========================================================================
remote:
remote:     A test instance of GitLab for the Wikimedia technical community.
remote:   Data may disappear. Consider everything here potentially public, and
remote:                     do not push sensitive material.
remote:
remote: ========================================================================
remote:
remote: View merge request for work/thcipriani/T1234:
remote:   https://gitlab-test.wmcloud.org/thcipriani/operations-puppet/-/merge_requests/1
remote:
To gitlab-test.wmcloud.org:thcipriani/operations-puppet.git
 * [new branch]            HEAD~4 -> work/thcipriani/T1234

As is already mentioned, this could be wrapped in a friendlier interface.

Reply 13:53, 23 September 2020 4 years ago

AKosiaris (WMF) (talkcontribs)

I 've iterated a bit on your take. Mostly a for loop to go through all the changes. What I got

is at https://gitlab-test.wmcloud.org/akosiarisgroup/simplegroupstuff/-/merge_requests?label_name%5B%5D=T42

$ for i in 5 4 3 2 1 ; do \

git push -o merge_request.create \

-o merge_request.target=main \

-o merge_request.remove_source_branch \

-o merge_request.title="We are leaving together ($i/5)" \

-o merge_request.description="And still we stand tall ($i/5)" \

-o merge_request.label="T42" \

origin HEAD~$((i - 1)):refs/heads/Europe$((i - 1)) ; done

Couple of notes:

The gitlab label for this specific use case seems to supplant the gerrit topic branch adequately
CI is run on every MR, which is what we want
While some merge_request git push options are static, some others like title and description aren't. A wrapper tool will need to extract them from the git commit messages I guess. That adds a bit to the complexity of the tool that needs to be written, but it's arguably doable and probably maintainable. It will be however akin to git-review (does anyone cringe already?) albeit only for those that require this workflow

The big issue I see is the fact we don't got dependent MRs in this case. So any of the later MRs can be merged at any point in time by mistake causing the expected mayhem. And it seems like this is a Paid feature and not a Community edition feature per Wikimedia Release Engineering Team/GitLab/Features, which isn't a great sign. The notes in that table say "Important for our workflows, but can be re-implemented as a CI job.". Not sure what that means? We 'll create CI that checks something, but what exactly? A "Depends-on"? That's opt-in (as in the user must actively write it), it will probably not safeguard us much.

Reply 14:51, 23 September 2020 4 years ago

Nikerabbit (talkcontribs)

I would imagine that doing a wrapper that emulates `git review` behavior in a way that creates a separate merge request for each commit wouldn't be too hard. The real issue is lack of nice UI in GitLab to automatically rebase merge requests on top of the target branch.

Reply 06:48, 23 September 2020 4 years ago

Adamw (talkcontribs)

Gitlab will show you when a merge request is going to conflict with master, and a successful merge includes rebase. Is there a reason we need to explicitly rebase the merge request before merging, or maybe that's just a habit carried over from our Gerrit workflow?

Reply 06:54, 23 September 2020 4 years ago

Nikerabbit (talkcontribs)

If the merge requests depend on the previous one, at least the merge base needs to be updated.

If there are conflicts during a rebase, I would like the test pipeline to run again on the rebased merge request before it is merged to the master.

Reply Edited by Nemo bis 08:40, 23 September 2020 4 years ago

DLynch (WMF) (talkcontribs)

I just want to chime in to agree that losing / making-more-complicated the ability to separate out commits into logical units of work seems like a bad thing for our ongoing code health. Workflow forces pushing us towards squashing only semi-related commits together when we merge looks unambiguously bad.

I'd be less concerned if we had the premium merge dependencies, though it looks like they lack some of the convenience features of gerrit's topic chains.

Writing a `git review`-like tool seems like a maybe-viable compromise... but if we're writing custom tooling and will expect any contributor who makes more than an utterly trivial change to be using it, are we gaining so much from a migration any more?

Reply 15:15, 28 September 2020 4 years ago

Cscott (talkcontribs)

Given the importance of this stack-of-patches workflow to many current developers, I would have liked to see a more concrete plan, including timeline, for implementing this in WMF's gitlab migration. Especially as the required gitlab features for this are not expected to be included in the Community Edition of gitlab WMF is planning to use? More clarify would be helpful.

Reply 17:01, 26 October 2020 4 years ago

TCipriani (WMF) (talkcontribs)

In terms of timeline the GitLab/Roadmap speaks to the chronology of events. Tying the chronology to real world times: the "Utility project migration" heading in that roadmap is where we hope to be in 8 months. I've called out the dependent patchsets explicitly in that step.

We've raised this with GitLab as well. So far they've provided some workarounds that are a bit clunky. I'd encourage developers that care about this feature to poke the upstream ticket as a signal about the feature's importance: https://gitlab.com/gitlab-org/gitlab/-/issues/251227

Reply 19:44, 26 October 2020 4 years ago

Reply to "Integration: Merge requests and patchsets"

The MR/PR model is probably inevitable

4 comments • 19:37, 16 December 2020 4 years ago

BBearnes (WMF) (talkcontribs)

There's a point I've argued in conversation that I'm not sure has been articulated explicitly as part of this consultation, so I'll do my best to lay it out here.

Briefly: It seems likely to me that we're getting the PR/MR model whether we want it or not. My thinking is as follows:

The current status quo is not that everything lives on Gerrit. Per the "Why" section, it's Gerrit plus 150-odd repos on GitHub.
If we didn't have a requirement that things deployed to production be hosted on Gerrit, the GitHub number would almost certainly be higher.
If we don't provide standard code review & CI tooling that meets some basic expectations, projects and teams will continue drifting to other platforms.
Eventually, we're going to reach a crisis point with Gerrit. It'll be brought to us by one or more of:
- Our ability to maintain a public Gerrit instance (already stretched to the breaking point in terms of people and resources)
- The upstream health / responsiveness of Gerrit as a project
- Pressure from developers and projects/teams to ratify the de facto migration away from Gerrit which is already underway

And at that point, my expectation is that we're going to wind up scrambling to adapt, locked into a fully-proprietary monopoly platform (GitHub) with little control over the decision, and cleaning up a few years' worth of additional fragmentation. We'd still be adapting to PR-style workflows and tooling, just less deliberately, not on our own terms, and at a greater remove from the path taken by other projects that share a great many of our values and concerns.

In thinking this through, it's also become clear that if we elect not to migrate away from Gerrit at this time, we're still going to have to spend substantial money and person-hours on the technical problems of our code review infrastructure. There's just not a viable option to do nothing here. (I specify "technical problems" because this consultation is first and foremost about improving an unsustainable software situation, not about whether our culture and priorities around code review need help. The latter is a very important question, but it is not the problem we set out to solve with this process.)

Reply 06:19, 2 October 2020 4 years ago

TCipriani (WMF) (talkcontribs)

In addition to the 152 GitHub projects you mention there are several additional GitHub organizations that contain repositories used in people's day-to-day. Not to mention the tools that exist under individual user accounts that folks are using for day-to-day work.

Many repos are created outside Gerrit because it's easier to create them elsewhere. Or easier to set them up elsewhere. Or easier to access them elsewhere. I, personally, don't put small projects on Gerrit because I don't want to think about where they fit in the giant hierarchy of things in Gerrit before I can even start on a README.

I am a Gerrit workflow fan, but I worry that if we don't address the real issues with Gerrit that we'll just end up slouching into whatever's easiest without regard for guiding principals or preserving workflows or CI or deployment or anything other than what's expedient.

Reply 21:33, 2 October 2020 4 years ago

Adamw (talkcontribs)

I'm slowly coming to the same realization, if for different reasons. We discovered that force-pushing to a branch leaves no record of the previous history. This is a dangerous situation because an accidental push could irreversibly destroy work and break auditing. If the branch is associated with a merge request however, the patchset comparison tools become available. We very much would want to use this workflow, since most of us have been conditioned by years of force pushing and I expect that we'll find ourselves continuing to do so.

Reply Edited 12:32, 2 October 2020 4 years ago

KHarlan (WMF) (talkcontribs)

> Pressure from developers and projects/teams to ratify the de facto migration away from Gerrit which is already underway

Isn't this a social issue, in that teams are largely free to pick whatever code review platform they like to do their work -- similar to how different teams used different chat mediums, or I think in the not too distant past there were various combinations of Asana / Trello and perhaps other bug trackers in use by team. Similar to how in theory everyone is supposed to use phabricator to organize and document their work, there should probably be a similar effort to have people use the same code review tooling. Otherwise I could easily see, of all those repositories listed as being used on GitHub, the majority staying on GitHub since GitHub !== GitLab.

> Our ability to maintain a public Gerrit instance (already stretched to the breaking point in terms of people and resources)

My understanding is that GitLab is more complex to host and maintain, would it require fewer resources?

Reply 03:11, 3 October 2020 4 years ago

Reply to "The MR/PR model is probably inevitable"

Lowering the barrier to contributing

11 comments • 11:25, 14 December 2020 4 years ago

MusikAnimal (talkcontribs)

While Gerrit may have features that are arguably superior in terms of code review (depending on your workflow), to me, it poses too great of a barrier to contributing, and is a constant source of confusion. I've been using it for 4 years and I still find myself occasionally having to ask for help. I can't help but wonder just how many volunteer developers we've lost because of this. Let's say as a new developer I wanted to fix a simple typo, or add a new line to a config file -- why do I need to read a manual on how to do this? Unless our goal is to increase the barrier to contributing, I'd say there's really no contest here... GitLab/GitHub/BitBucket are all scores more user-friendly. Sure, once you are familiar with Gerrit, its powerful features start to shine, but I think we should do our best to foster open-source development by keeping the barrier to contributing as low as possible, just like we try to do on the wiki. It's for these reasons that I would never host my own Wikimedia tools on Gerrit.

That said, if we do stay with Gerrit, I think there are some small improvements we could make to improve the user experience. For instance, I had +2 rights when I first started using Gerrit. On my first attempt at reviewing code, I of course hit the pretty blue "Code Review +2" button, as it would seem that would 'start' the code review process. Two members of my team at WMF did the same thing when they first joined. I think the button should instead say "+2 Merge", and perhaps have a confirmation modal. Or, say the build gets stuck. You might see another pretty blue "Submit" button. I would have expected that to re-submit the jobs, or something, not merge and bypass CI entirely! Again, "Merge" might be the better wording. It's weird that all the buttons have tooltips except the one that actually can cause problems, and the problematic buttons are so easy and inviting to click on. These are just minor examples. I also struggle to navigate the codebase through the UI, can't ever remember how to follow projects, not to mention those secret commands to control CI via comments... the list goes on and on. Left to my own devices, I always use the GitHub mirrors to browse and share code.

I hope my wording does come off as too strong. A lot of people have put immense work into Gerrit, and I know it works exceedingly well for some people. Perhaps GitLab seems like a toy to some. I suppose it's just a trade-off between power and usability, and I hope we don't neglect the usability aspect when making our final decision.

Reply Edited 23:16, 13 September 2020 4 years ago

Nikerabbit (talkcontribs)

I fully agree that we should lower the barrier to contributing, but we should be conscious about the trade-offs. If we switch

productivity of some developers, like me, would likely decrease temporarily as we learn and adapt.
productivity of some developers, like me, could possibly decrease permanently, if GitLab does not support certain kind of workflows as fluently.

In addition, a lower barrier to entry has to be balanced with managing the incoming stream of contributions, not all of them valuable. We know from Wikipedia that it can only work if sufficient tooling and resourcing is present to filter out spam, vandalism and improve contributions which do not quite meet the requirements. Are we prepared to fight the spam, vandalism and drive-by contributions that are not mergeable without further work? Do we have sufficient guidance for contributions so that they can work with us, and not (unknowingly) against us?

I don't have answers to any of these questions, but I hope that there will be by the end of this consultation. Personally, I will try to figure out the first part, how much would my productivity be affected by the switch.

Reply 15:52, 14 September 2020 4 years ago

WDoran (WMF) (talkcontribs)

From my limited experience here, managing the flow of inbound work is already a significant issue at least for our team. This involves making hard choices and trying to balance resources. On Platform Engineering, we've tried to adopt processes that give clear interfaces for other teams but the volume is already quite high.

I do not at all mean to discount this point, I think it's valuable and prescient but above all something we should already have impetus to address. Building up a better experience both for our internal teams and external contributors should absolutely be a focus.

I'm not sure if it's possible but it might be worth reviewing the practices of other large scale groups and seeing what we can adopt or if there is a willingness to knowledge share with us. I know our own team had an excellent experience working with Envoy recently to contribute upstream changes.

Reply 18:52, 15 September 2020 4 years ago

Hashar (talkcontribs)

I am pretty convinced it is a social problem rather than a tooling issue. We had the same problem under the CVS/Subversion area, new commits were send to a mailing list and reviewed after the fact. In 2008, Brion sprinted the Extension:CodeReview (GitHub was just starting at that time) which at least make it easier to process the backlog. I came back as a volunteer in 2010 and went on a review frenzy, but we still had glitches.

Others would correct me, the main incentive was to switch to git. Gerrit came with the nice addition of holding the flood of patches as pending changes which nicely fitted MediaWiki: patches were on hold until reviewed thus protecting production.

Gerrit surely has its flaws, but I don't think the review issue is a tooling issue it is entirely social and related to our "bad" (but improving) development practices and community as whole.

For the tooling consultation, we might be able to look at repositories maintained by Wikimedia on GitHub and see whether the reviews are better handled there. But the corpus of repositories is vastly different (in my experience interactions for a given Github repository are mostly from a single wmf team).

Reply 19:45, 15 September 2020 4 years ago

MusikAnimal (talkcontribs)

Will GitLab login require a Wikimedia developer account, like Gerrit does? If so I think that alone would cut out a lot of drive-by garbage, at least spam and vandalism. I can't imagine it'd be much worse than what we see on Phabricator, no? Even if there was an approval process to get access, that might be okay... my issue is good-faith, competent developers (volunteer and staff alike) who already have access still struggle to use the software. It's not just about making patches, but participating in code review, and doing basic things like watching projects and navigating the code, or even finding the command to clone a repository (though downloading an individual patch I think is easy enough to figure out). Or say I click on a Change-Id, it forwards me to the patch, and all of a sudden by browser's history is polluted with redirects making it hard to get back to the previous page. It's all the little things, that together combined with the confusing CI system can turn routine tasks into headaches. This all is of course just my opinion/experience. I am fairly confident these days with Gerrit, but it took a long time for me to get here.

Reply 21:08, 15 September 2020 4 years ago

BBearnes (WMF) (talkcontribs)

Will GitLab login require a Wikimedia developer account, like Gerrit does?

Yeah, that's the plan.

(Edit: Well, that's my assumption as to what the plan would be. Specifics will need work, but GitLab CE supports LDAP.)

Reply Edited 00:09, 16 September 2020 4 years ago

Tgr (WMF) (talkcontribs)

Like others, I'm worried we are misidentifying the problem here. I agree in theory that we should prioritize a low barrier of entry and good learning curve above power-user-friendliness - both for pragmatic reasons (we can always use more hands, and the Wikimedia open source projects seem very far below the potential that being a top10 website and the top free knowledge management tool should grant them) and because it fits well with our values of openness and equity.

In practice, though, I agree with Hashar that the main bottleneck is human. This is something the "why" section of the consultation doesn't engage with as well as it should - yes, surveys have shown code review to be the biggest pain point, but we don't have any good reason to think Gerrit was the main reason for that. Resoundingly, the biggest complaint is the lack of reviewer response; the WMF has so far chosen not to invest significant resources into fixing that. So I worry that 1) this will be a distraction (we feel good that we are now doing something about developer retention, so addressing the real problem is delayed even further); 2) maybe even harmful if GitLab is worse at supporting efficient code review (one thing Gerrit excels at is finding patches; as such it's reasonably okay at supporting our somewhat unusual situation of a huge pile of repos with unclear or lacking ownership, and some repos which are too large for repo-level ownership to be meaningful); 3) it will just lead to more churn (if you have a social system with a limited capacity for supporting newcomers which is already overloaded, and you make the technical means of joining that system easier, you'll end up with the same amount of successfully integrating users but much more deflected ones, who have negative experiences with the Wikimedia developer community and it will be harder to reach them later once we improved things).

To phrase things more actionably, I'd really like to see Gerrit and GitLab compared specifically in terms of their ability to support code review if it remains a largely voluntary activity, not incentivized or rewarded by management. Will it become easier or harder to find unreviewed patches accross repos, by various criteria like "recently registered user" or "productive volunteer contributor"? Will it be easier or harder to track code review health on a global or repo level? Will code review take less or more time?

Reply 03:59, 1 October 2020 4 years ago

Tgr (WMF) (talkcontribs)

I'd add that CI is IMO the one area where tooling can efficiently support code reviewers - tests and linters basically provide automated code review, and they reduce the reviewer burden as long as they provide it in a comprehensible format. This something our current system is really bad at - patch authors need to figure out what went wrong by parsing dozens of pages of console logs, a terrible experience for new developers (and an annoyance for experienced ones). I'm not sure how much that is an issue with Gerrit though. It had the ability for years to filter out bot noise from review conversations, for example, and we haven't bothered to make use of it until recently. Since recently it has the ability to assign test errors to specific lines and show them in context, and there is no organized, resourced effort to convert our test tooling. So again I don't know if the switch would address the real issue there. Does GitLab even support inline CI comments? From speed-skimming the docs, my impression is it does not (interactive CI debugging OTOH sounds like a really cool feature, but it is not for beginners). Making sure all of our major test/lint tools play nice with Gerrit features like inline comments and fix suggestions could IMO be more impactful for new developer retention while being a less ambitious (ie. less risky) project.

Reply 05:35, 1 October 2020 4 years ago

Hashar (talkcontribs)

We have the SonarCloud job reporting inline comments for issues it detects (via https://github.com/kostajh/sonarqubebot ). For other linters, we would need the glue that process a linter report and emit the comments. That is T209149 Have linters/tests results show up as comments in files on gerrit.

Reply 14:21, 1 October 2020 4 years ago

Tgr (WMF) (talkcontribs)

@Hashar yes, and it is not on any team's roadmap (much less on the annual plan) to do so. Kosta has done an amazing job with SonarCloud, and there is a working group doing great work, but it's mostly a personal effort that is happening due to the dedication of the participants, and to the extent they can find free time for it. Meanwhile we are considering this moonshot project to address a problem when there are bigger problems that could be addressed with far less effort.

I don't want to downplay Gerrit's UX weaknesses, it is certainly a serious problem for developer retention. I find the arguments that we should at some point migrate away from it convincing, and as a superficial first impression GitLab seems like a decent place to move to. But given there are problems which are more severe and can be addressed with less cost and less risk, it feels a bit like a prioritization fail.

Reply 19:19, 1 October 2020 4 years ago

ProcrastinatingReader (talkcontribs)

I have no comment on all the nuances described elsewhere on this talk, but I can say that Gerrit is a huge bar to contributing. I don't understand any of it (to be fair, I haven't tried, and don't intend to learn) -- I know two commands and I get by on them. So maybe it's not the biggest bar in practice, but it's a psychological / "can I really be bothered" bar. Verses just knowing what to do, and being able to spend your time on the code rather than on learning Gerrit. Most devs, especially volunteer ones, will not be exclusively contributing to MW. And I would hypothesise it's likely most other projects they contribute to are on GitHub, or using the GH flow. Hence it's more intuitive and a lower barrier to entry.

I think it would certainly help improve contributions. Admittedly, last I used GitLab I didn't have that much love for it (many years ago now), but it is certainly a big improvement, and I think it's better in the long term. I do not think Gerrit is sustainable if we think about the years ahead, when I think these kinds of tools will become more and more forgotten. My opinion: the quicker MediaWiki moves on from Gerrit, the better. And I hope one day something is done about phab too, although that is more a preference rather than a problem.

Btw, respect for everyone who has made Gerrit work this long and tried to abstract away the barrier to entry. Not trying to diminish that work, by any means. But I think there's only so far you can go.

Reply Edited 16:41, 17 October 2020 4 years ago

Reply to "Lowering the barrier to contributing"

Gitlab's community edition relies on nonfree proprietary software to combat spam & abuse

6 comments • 14:09, 6 December 2020 4 years ago

Ian Kelling (talkcontribs)

It relies on the proprietary Akismet and Google's recaptcha. It is a known target for spammers. Without turning those on, it will quickly be overloaded with spam The main page mentions "GitLab is a system used successfully by many other members of the Free Software community (Debian, freedesktop.org, KDE, and GNOME)." freedesktop.org and debian turned on recatpcha, their instance cannot be used in freedom, it requires users to run proprietary google code. KDE and GNOME don't allow user registration. I've looked around, and there is no instance that runs the community edition and is open to the public for general use other than gitlab.com (which is running a proprietary version). It has been this way for several years. Gitlab has made lip service toward at least removing recaptcha, but so far has done nothing. It also optionally "integrates" each repo with over 10 different nonfree programs or services, "settings, integrations", so unless you trusted all your users to avoid using those, you would need to patch the software to use it in freedom. So, where the main page says "it adheres to the foundation's guiding principle of Freedom and open source", I don't think that is correct.

Then you have what some might consider more minor issues: People who want to contribute will have to do it upstream and run nonfree recaptcha to register, and they will have to do it in a repo containing all the nonfree code and make sure their contribution fits in with the nonfree parts of gitlab. They only have 1 version of the documentation, it includes the docs for all their nonfree features. Most instances of gitlab use nonfree code (including gitlab.com, debian and freedesktop.org), so calling your instance a gitlab instance would have an effect of promoting gitlab and proprietary software use. Gitlab's new repo license recommendation UI are at odds with the FSF's recommendations: see https://libreplanet.org/wiki/FSF_2020_forge_evaluation.

Reply Edited 23:22, 22 September 2020 4 years ago

Hashar (talkcontribs)

> Most instances of gitlab use nonfree code (including gitlab.com, debian and freedesktop.org), so calling your instance a gitlab instance would have an effect of promoting gitlab and proprietary software use. Gitlab's new repo license recommendation UI are at odds with the FSF's recommendations: see https://libreplanet.org/wiki/FSF_2020_forge_evaluation.

Hello Ian. I have looked at instances for Debian ( https://salsa.debian.org/help ), KDE ( https://invent.kde.org/help ) and Gnome ( https://gitlab.gnome.org/help ), they all list the community edition. Do you have any hints as whether they are using nonfree code or was that referring solely to recaptcha? We would mostly certainly not use that :)

Reply 13:42, 1 October 2020 4 years ago

Ian Kelling (talkcontribs)

> Do you have any hints as whether they are using nonfree code or was that referring solely to recaptcha?

All I can see is the nonfree captcha. Hopefully that is all. All the gitlab "integrations" that call out to other nonfree services are still available for their users to use.

Reply 05:17, 15 October 2020 4 years ago

Nikerabbit (talkcontribs)

These issues were raised in the thread Topic:Vt99ei7sjd0i9f62. Recaptcha is not going to be enabled if we setup a gitlab instance.

Reply 06:44, 23 September 2020 4 years ago

Nemo bis (talkcontribs)

Indeed all past migrations of big projects to GitLab have been a failure for software freedom so far. If we manage to keep the service running properly without proprietary software, we'll be a first. It might be possible but it will require a big investment.

Reply Edited 08:23, 23 September 2020 4 years ago

Tgr (WMF) (talkcontribs)

As discussed elsewhere (e.g. Topic:Vu7w5ouu1khiztrd we'd keep using our own SSO system so at least login captchas are not a concern. (Captchas for rate throttling, maybe. But then Gerrit doesn't have anything like that, so it won't be worse than the status quo.)

Reply 06:44, 1 October 2020 4 years ago

Reply to "Gitlab's community edition relies on nonfree proprietary software to combat spam & abuse"

Did you consider git-hosting platforms not linked to commercial entities?

2 comments • 18:33, 4 December 2020 4 years ago

78.54.178.98 (talkcontribs)

Looking back, corporate-sponsored FOSS projects seem to be somewhat at risk of getting abandoned (Gerrit by Google Inc. not atypical?). Did you consider and evaluate "grassroot-movement", community-driven Open-Source alternatives like Gogs or Gitea, so that future development does not depend on a single commercial sponsor? Platforms like NotABug or Codeberg.org seem to prove that these are approaching maturity and scale easily to several thousand repos and would easily meet the requirements listed above. Have such alternatives been discussed and evaluated?

16:15, 6 October 2020 4 years ago

BBearnes (WMF) (talkcontribs)

I'll preface this by noting that I use Gitea personally, and find it to be pretty good software. That said, though this consultation is specifically about whether or not to use GitLab for code review, we initially evaluated GitLab in the context of looking at alternatives for our continuous integration system, and that's still a problem we need to solve. Gogs/Gitea is essentially a lightweight replication of the GitHub-style code forge, not a platform with components like the full-fledged CI system that motivated us to investigate GitLab in the first place.

The shorter version of this answer is: Not really, but not for lack of awareness.

17:36, 6 October 2020 4 years ago

Don't create new Github repos

3 comments • 02:35, 16 November 2020 4 years ago

Framawiki (talkcontribs)

I don't really understand why some of code made by the WMF is not hosted on its git platform, actually Gerrit. So I hope that all reasons used for the exceptions (for example notebooks previews) will be solved for official Wikimedia Gitlab creation.

May it be possible to define a rule saying that all developments made during WMF employees worktime be made on our Gitlab instance exclusively? Of course excepting pull requests for improvements on external repositories hosted somewhere else.

Reply 17:55, 10 October 2020 4 years ago

AKlapper (WMF) (talkcontribs)

Okay, but how is this related to the GitLab consultation...?

Reply 18:29, 10 October 2020 4 years ago

Hashar (talkcontribs)

When we have migrated from Subversion to git, we have selected Gerrit as the code review system. As part of the project we also had the repositories mirrored to GitHub https://phabricator.wikimedia.org/T37429.

Why? Well I am not quite sure, but most likely to open the possibility to submit a pull request via GitHub: https://phabricator.wikimedia.org/T37497 . At the time (2012), some wanted additional tooling to make it very easy to contribute. I would argue the complexity of the tooling and the reviewing workflow itself are more to blame as a barrier of entry rather than the tooling itself, but that is really just my point of view.

Before that subversion migration, we already had repositories on Github mostly for mobile applications:

And after the migration to git/Gerrit, we still had repositories created on Github instead of Gerrit. For example: Limn, a data visualization framework https://github.com/wikimedia/limn . Groups got created, people added to them and eventually more repositories have been created.

In short we do not have a policy to enforce Gerrit has the canonical code hosting place. Although anything touching MediaWiki on production is definitely on Gerrit (we do not deploy from GitHub hosted repositories), anything else is a gray area at the discretion of the team, and sometime due to technical limitations such as testing the IOS based applications.

The point you have raised to have a rule to exclusively host on Gitlab is covered on the consultation page:

What happens to repositories developed on GitHub if we move to GitLab?
- Given that GitLab provides a very similar workflow and feature set, we will strongly encourage all developers to use GitLab instead of GitHub for all development. Repositories will still be mirrored to GitHub, for visibility purposes.

So essentially the same situation: still mirroring and GitHub is not explicitly forbidden. Then given Gitlab and Github have essentially the same workflow, one can imagine that repositories might want to migrate from GitHub to Gitlab unless they rely on tooling which is only available at GitHub (such as issue tracker, see https://github.com/issues?q=is%3Aissue+org%3Awikimedia for currently opened issues on the Github organization).

Reply 09:39, 12 October 2020 4 years ago

Reply to "Don't create new Github repos"

Self-service continuous integration

3 comments • 06:48, 11 November 2020 4 years ago

Krinkle (talkcontribs)

The second of three listed "Why"s is easy and self-service continuous integration configuration.

This has indeed been a point of friction for many years. This wasn't related to Gerrit, but rather because we didn't resource/prioritise setting up something that could securely run unreviewed code for installing arbitrary packages and running arbitrary shell commands.

Between 2013 and 2015 we invested in this. We got rid of the hardcoded Jenkins jobs, and instead defer all package and command selection to a file in the source repository/branch, just like Travis CI and GitLab. These files are package.json, composer.json, Gemfile. Just like Travis, the entry point commands are just "npm install + npm test" or "composer install + composer test". Fully self-serviced.

There are some cases where for security or performance reasons, where we bypass the base image and instead provision something custom ahead of time for a specific repository. I assume this will still be possible in GitLab, and would require similar effort either way.

From an end-user perspective, what is the difference?

(I do want to recognise that RelEng currently spend significant time maintaining the Docker base images that drive this. I believe GitLab has similar preset images, that would save RelEng time. However, the consultation lists ease of use for end-users. And, of course, changing the backend of CI to GitLab was already approved months ago and is out of scope here. Also, whether we can/should use GitLab's base images remains to be seen since I believe we generally prefer to match OS and package backports with prod.)

Reply 19:30, 2 October 2020 4 years ago

TCipriani (WMF) (talkcontribs)

From an end-user perspective, what is the difference?

I might take issue with your characterization of our current CI as "Fully self-serviced". Only 19 people out of all users of Gerrit can fully setup CI without any help.

I'm just going to stumble through getting something running as an experienced person at stumbling through the process.

GitLab

Click "Set up CI/CD" on the repo
Click the "Apply a template" dropdown
Click "Commit changes" button
Jobs run in CI

Current CI

git clone ssh://gerrit.wikimedia.org:29418/integration/config.git
$EDITOR zuul/layout.yaml -- grep around for "npm" and find repos using "reponame"-npm-node-6-docker
git grep 'node-6-docker'
$EDITOR jjb/job-templates.yaml -- There's a job-template that seems to do what I want...ok...got to use that template -- I'll git grep for projects using that template
$EDITOR jjb/mediawiki-services.yaml -- there appear to be a lot of projects using the template I want here...maybe this is where I add my project:
```
- {project: {name: 'tyler-test', jobs: {name}-npm-node-6-docker}}
```
So that should create the job, now I need to add the job to the repo
$EDITOR zuul/layout.yaml (again)

- {name: tyler-test, test: tyler-test-npm-node-6-docker}

Send for code review
self-merge (27 people can do this currently, including you and me)
Deploy the job (19 people can do this currently, including you and me)
Deploy a new zuul configuration (19 people can do this currently, including you and me)

This is the perspective when we already have the functionality to do something simple. As you mention adding new docker images (something only the same 19 contint-admins can do) adds complexity to this step. You need to add a new Docker image currently if you want to, say, install a library that your node project is using -- it's not uncommon.

I stumbled my way through to a working CI in gitlab without reading any documentation. I've been maintaining CI via zuul/jjb for 5 years and I still had to do a lot of grepping.

Once you've got your CI setup you can change things through the npm test entrypoint it's true, but this is different than what I mean by self-service CI.

Reply 20:47, 2 October 2020 4 years ago

Nikerabbit (talkcontribs)

I did not find the CI self-service setup easy on the GitLab test instance. If you look at https://gitlab-test.wmcloud.org/translatewiki.net/repong/-/commits/master it took me four commits to get it working. I could not do it without looking for a working example from another repo in the test instance. Possible caveat is that on the actual instance the images might be unrestricted, so the premade templates would actually work.

I found no way to actually test the pipeline without committing it to the repo first. Now there are bunch of useless and broken commits there in the history. Even if there is a way to test before committing, it is definitely not obvious as I spend a lot of time trying to find it.

Reply 16:23, 3 October 2020 4 years ago

Reply to "Self-service continuous integration"

Notes from Product Analytics

5 comments • 02:26, 8 November 2020 4 years ago

MPopov (WMF) (talkcontribs)

Hello, I'm writing on behalf of Product Analytics. From our discussion:

THE biggest differentiator from Gerrit for us is GitHub's ability to render Jupyter notebooks (example); GitLab can do this and we just want to make sure that this feature is enabled (and maybe coupled with an internally-hosted nbviewer service for the actual rendering).
We frequently need to read and search code, and Gerrit has extremely poor support for this. Many of us use GitHub to search the mirrored repositories.
We have generally chosen to use GitHub for our code/analysis repositories since we find it much easier to use, and creating repositories is much easier (since we can do it ourselves without requesting).
Conversations on Gerrit can be difficult to navigate since comments are tied to specific patchsets, so there may be an active discussing happening about something in patchset 3 meanwhile the patch is already on patchset 9. If CR in GitLab is similar to GitHub (in terms of how comments/conversations happen & are displayed) that is nice.
In the past we've used GitHub Pages for sharing reports. For example, if generating an HTML document from the R Markdown source document where the analysis is done it's easy to enable GH Pages to have the "rendered" version of the report available via URL (example); GitLab appears to also have this feature and we'd like it available if possible.

From my own perspective, as author & maintainer of several R packages the team uses in our workflows, GitLab's support for CI for R packages (more info) is very appealing. There have been efforts made in the past (T153856), but modern CI tools (especially with availability of r-base Docker image) will make it possible for us to have proper CI (which I have on my personal R packages on GitHub).

Reply 21:21, 28 September 2020 4 years ago

Tgr (WMF) (talkcontribs)

Since Gerrit 3 we have a Comment Threads tab which is fairly similar to how conversations are displayed in Github.

The consultation page says In addition [to issue tracking] we would turn off repository wikis, GitLab Pages, and other features overlapping with currently provided tooling. (which I find a bit confusing: sure, we have a - probably superior - existing alternative for issue tracking and wikis, but what's the currently provided tooling for GitLab Pages-like functionality? people.wikimedia.org is only awailable to a few people and using Toolforge for this purpose would have a ridiculous level of overhead. Doc page generation via CI, maybe? It's not quite the same thing - you can use Pages to generate a webpage from your repo code, but also in a number of other ways. And in any case, doc generation via CI seems even more arcane and complex to set up than Toolforge.)

Reply 09:15, 1 October 2020 4 years ago

BBearnes (WMF) (talkcontribs)

Re: Pages:

which I find a bit confusing: sure, we have a - probably superior - existing alternative for issue tracking and wikis, but what's the currently provided tooling for GitLab Pages-like functionality? people.wikimedia.org is only awailable to a few people and using Toolforge for this purpose would have a ridiculous level of overhead

FWIW, I don't think we in the consultation WG have analyzed that particular aspect of things deeply. If there's a strongly felt use case for a Pages-like feature, then I think that's probably a reasonable discussion to have. We've called out wikis and issue tracking explicitly to prevent fragmentation in those domains, and I don't have a strong feeling as to whether Pages presents a similar risk. Would be curious what others think.

Reply 16:45, 1 October 2020 4 years ago

Neil Shah-Quinn (WMF) (talkcontribs)

@MPopov (WMF) said above:

We have generally chosen to use GitHub for our code/analysis repositories since we find it much easier to use, and creating repositories is much easier (since we can do it ourselves without requesting).

To expand on that, it's not just that we have the rights to create GitHub repositories in the wikimedia and wikimedia-research organizations. It's also that we can create repositories under personal GitHub accounts and later move them effortlessly to the main organization.

For example, I originally created wmfdata-python to streamline my personal analysis workflows, so I naturally stored it in my personal GitHub namespace. Over time, others on my team and, later, researchers on other teams started using it too. Eventually, we decided we should move it to a more official location. With GitHub's move repo feature, it literally took 1 minute to accomplish this and the automatic redirection (for both web and Git access) make it completely seamless for user.

From what I understand, GitLab has these exact same abilities natively. Some comments here have pointed out that it would be theoretically possible to create user namespaces in Gerrit, which would be an improvement on the current situation, but as @BBearnes (WMF) said it would be "fighting the design of the system" and wouldn't be nearly as good as the GitLab/GitHub model.

Reply 08:55, 5 October 2020 4 years ago

Neil Shah-Quinn (WMF) (talkcontribs)

Also let me emphasize another point that Mikhail made:

THE biggest differentiator from Gerrit for us is GitHub's ability to render Jupyter notebooks (example); GitLab can do this and we just want to make sure that this feature is enabled (and maybe coupled with an internally-hosted nbviewer service for the actual rendering).

Jupyter notebooks have nearly become the common format for data science (for example's, GitHub's State of the Octoverse report says that their use on GitHub has grown more than 100% in each one of the last three years).

Gerrit can only display Jupyter notebooks as long JSON blobs, but GitLab can show them in their rich, rendered format. This is a hugely important feature for us; if we switch to GitLab, we can start using it to host our analysis code, but if we stick with Gerrit, we will have no choice but to continue the fractured status quo ("production"/"library" code on Gerrit, analysis code on GitHub).

Reply 09:05, 5 October 2020 4 years ago

Reply to "Notes from Product Analytics"

Re: Why "not" Gerrit?

12 comments • 02:25, 8 November 2020 4 years ago

QEDK (talkcontribs)

I think some folks have rightly pointed out the reasons to not switch but I think those have to viewed in context to the future - how long can we keep using Gerrit with its annoyingly high learning curve is the primary question? Is the transitionary, painful period for present Gerrit developers worth the potential new contributors - I believe it has a nuanced answer, but I still think it's one that's positive. Some people like GitHub and some people like GitLab, but what we do know is that a lot of people definitely don't like Gerrit, and I think we are in a good place to start moving towards something better (atleast relatively).

Reply 20:58, 17 September 2020 4 years ago

EMcnaughton (WMF) (talkcontribs)

As someone who has used gerrit for around 5 years, github for a bit longer & gitlab for a lot less I still find gerrit the hardest to work with by a strong margin - I suspect I still don't know how use it 'properly' but the number one thing I hate is the way discussions on commits don't really flow. Since I'm often involved in upstreaming I often link to upstream PRs, with screenshots & all the UI niceness around discussions on PRs (I prefer the github UI but will go with gitlab at a pinch), and try to redirect discussion there rather than try to parse it out of gerrit.

I do like the +1 & +2 system in gerrit & the download links

Reply 21:43, 17 September 2020 4 years ago

ESanders (WMF) (talkcontribs)

As a counterpoint, I also use Github and Gerrit regularly and find Gerrit much easier for actually managing my commits. It would be nice if the discussion system was better, but you can always use Phabricator.

Reply 00:10, 18 September 2020 4 years ago

Mutante (talkcontribs)

I find working with Gerrit much easier than working with Github. If you forced me to start using Github tomorrow it would be an "annoyingly high learning curve".

Reply 16:38, 18 September 2020 4 years ago

AKlapper (WMF) (talkcontribs)

Isn't that to some extent muscle memory? I'm asking because I still have to open Gerrit/Tutorial for all the commands to run, every time I plan to push something for review into Gerrit.

It is the same in GitLab for me, but the number of steps isn't very different: If I don't fork I'd end up in GitLab with git checkout -b mybranch origin/master, edit, git add, git commit, and git push origin mybranch, and then create a merge request in the web UI.

Reply 16:35, 21 September 2020 4 years ago

EBernhardson (WMF) (talkcontribs)

Personally, the annoyingly high learning curve of gitlab isn't just about muscle memory (but it is in part!). It is about a completely different workflow for patches beyond a certain level of complexity.

Reply Edited 15:46, 22 September 2020 4 years ago

Tgr (WMF) (talkcontribs)

Gerrit concepts map to git concepts very well, so if you understand git, Gerrit is super intuitive to use (not the UI, of course, but the patch wrangling part). Gerrit is basically just a git repo with a fancy web interface; changesets are basically just reflogs etc. The same is true for Github as well, except the Gerrit workflow models git rebase, and the Github workflow models git merge, and rebase is the one you actually want for any large projects because otherwise history becomes a mess and your ability to understand the codebase suffers. So with Github, you end up using a merge-like workflow while making sure that internally it's actually not merge-based at all. That mismatch makes it unintuitive, IMO.

Reply 07:56, 1 October 2020 4 years ago

QEDK (talkcontribs)

I don't really agree with what Tgr said. If you're familiar with Git, Gerrit goes and adds gitreview on top of that to make it all that more complicated. I personally don't think it's "too" complicated but the notion that Gerrit is somehow more in line with standard Git workflow is misplaced. (And I'm saying this as someone who became aware of both Gerrit/GitLab/GitHub in the last 5 years.) If anything, GitLab/GitHub makes simple commits or even multiple simple commits that much easier to handle. For more complicated scenarios, I believe the experience is pretty much on par but fixing even a few of the myriad issues with using Gerrit makes it that much easier for our new folks. Furthermore, I don't agree that GitHub has to be a merge-based workflow, it's just more common, it differs from maintainer to maintainer and some of the repos even have settings to disallow merges in protected branches, most OSS repos will require you to rebase your pull request commits before they are squashed/rebased in.

Reply Edited 07:58, 2 October 2020 4 years ago

Tgr (WMF) (talkcontribs)

Evety git-review command (except for the one setting up the commit hook) is a convenience wrapper around a basic git command. git-review -x does a cherry-pick. git-review -d does a checkout. git-review without an option does a push. It's more things to memorize, sure (especially with the shortkeys not being particularly sensible), but conceptually it is still a simple git command. You don't have to juggle multiple repositories, either (although in the end repositories and remotes are also simple concepts if you properly understand git's graph model).

As for rebase workflows in Github, the UX didn't really support them the last time I checked (you can do it, but you won't have a sensible audit trail, old comments won't really be useful...). And GitLab Comunity Edition does not seem to support stacked merge requests, much less rebasing them, but that has already been discussed in plenty other threads.

Reply 02:34, 5 October 2020 4 years ago

QEDK (talkcontribs)

You're definitely correct in saying so. But I believe the short-term cost of transitioning is much lower than the long-term churn rate of potential new contributors because of Gerrit.

Reply 08:55, 19 September 2020 4 years ago

Nemo bis (talkcontribs)

I think the main difference is that we know Gerrit pretty well, so we are sort of able to estimate how big an investment it would need to fix certain issues (rather big). With GitLab we don't know yet how many issues and how much friction we'll encounter in the first few years after adopting it, so it's easy to underestimate the costs.

Reply 15:58, 22 September 2020 4 years ago

Mmullie (WMF) (talkcontribs)

I find GitHub to have a much more pleasing, inviting interface & workflow compared to Gerrit. However, once things become a little more complex, I (currently - possibly biased because I still used it way more often) heavily favour Gerrit.

It's easy to advertise a GitHub (-like) interface & workflow as being more inclusive & newcomer-friendly. It obviously is, but that means nothing unless those contributions end up getting merged (if not, we're only creating more frustration.) Being more friendly to newcomers is not a selling point until we can be assured that the workflow has no negative implications for repo maintainers (or are outweighed by the positives.)

I.e. pushing commit after commit onto a feature branch sure does seem simpler than carefully having to carefully maintain, amend & rebase a few dependant patches, but it's simply moving the cost up to the repository maintainer: code review becomes harder (code spread all over the place), and history becomes broken (unless commits get squashed after all.)

Someone has to pay for complexity: if it's not newcomers, then it's the maintainers (and if they are not willing or able to do it, the extra patches still aren't going anywhere.) I'm not currently sure how much of this "newcomer cost" GitLab would actually remove, and how much it would simply move the burden...

Reply 14:13, 28 September 2020 4 years ago

Reply to "Re: Why "not" Gerrit?"

Gerrit is multi-site and its implementation is open-source

7 comments • 13:46, 29 October 2020 4 years ago

Summary by TCipriani (WMF)

The Gerrit multi-site plugin README was out of date and misstated its support for multi-write. See: https://gerrit-review.googlesource.com/c/plugins/multi-site/+/285782/3/README.md

Lucamilanesio (talkcontribs)

What is written in the conclusion is not accurate, with regards to the multi-site capabilities:

"We are unique in the community of Gerrit users which include large companies such as SAP, Ericsson, Qualcomm, and Google. Google, in particular, is singular in their use of Gerrit for projects like Android and Chromium. To support these large, open projects multi-site capabilities are needed; however, much of that work is either closed-source or does not support multi-site writes".

If you follow the multi-site link you will see that the multi-site plugin is open-source, supports multiple writes from all sites and is able to prevent split-brains.

GerritHub.io has been multi-site for over one year.

21:55, 26 October 2020 4 years ago

Paladox (talkcontribs)

I even wrote a task on phabricator about this: https://phabricator.wikimedia.org/T217174

21:57, 26 October 2020 4 years ago

TCipriani (WMF) (talkcontribs)

Looking at the README on the multisite page the quote on the page is, "Currently, the only mode supported is one primary read/write master and multiple read-only masters but eventually the plan is to support multiple read/write masters."

We currently have one main gerrit sync'd with a replica. It seems that the multi-site plugin will not currently support two gerrit's being written to simultaneously without partitioning: is that accurate?

22:09, 26 October 2020 4 years ago

Paladox (talkcontribs)

Seems this has been fixed with https://gerrit-review.googlesource.com/c/plugins/multi-site/+/285782

22:27, 26 October 2020 4 years ago

Lucamilanesio (talkcontribs)

That comment on the README.md is stale and misaligned with the DESIGN.md. The multi-site plugin supports multiple sites in read/write and correctly prevent split-brains in case of two users pushing concurrently on the same repo on the same branch from two remote sites on the globe.

I have addressed the stale comment with a change for review, thanks for pointing that out :-)

With regards to the problems in migrating to newer versions of Gerrit, I do recognise that it has been difficult until v3.0. You guys are not far from the "tipping point" and I would be more than happy to help, as I did with the Eclipse Foundation and I am doing with the OpenStack project.

Also, the multi-site setup, allows Gerrit canary deployments because supports, from v3.0 onwards, different sites with different versions of Gerrit (typically the version +1).

Since the introduction of multi-site on GerritHub.io, we went from 99.9% uptime to > 99.99% uptime, and never declared a "planned outage" for any of our upgrades.

I would be more than happy to help the Wikimedia Foundation to get there as well.

Luca.

22:28, 26 October 2020 4 years ago

Lucamilanesio (talkcontribs)

Hi @TCipriani (WMF) the multi-site README.md has been updated, thanks for the reviews. Can you also update the relevant section in the GitLab consultation? Thanks a lot for pointing this out.

Luca.

22:49, 28 October 2020 4 years ago

TCipriani (WMF) (talkcontribs)

Done. Thanks for the update.

13:46, 29 October 2020 4 years ago

Retrieved from "https://www.mediawiki.org/wiki/Talk:GitLab/2020_consultation"

Return to "GitLab/2020 consultation" page.