Source control considerations

For latest information on source control, see Git

This page contains some thoughts related to MediaWiki's possible migration to a different source/version control system. Currently, there is no such plans, but such a possibility is unofficially discussed pretty often.

Why migrate? edit

Deficiencies of our current version control edit

  • Current mediawiki/trunk/ is a huge pile of different projects, some of them outdated, some of them are nobody remembers what for. Reorganization is long overdue.
  • Extension:CodeReview is lacking as a code review tool, largely because it is tied to SVN. For example, commits are made then undone regularly (sometimes multiple times back and forth) as part of the standard "code review process" O.o. Tools associated with other (distributed) version control systems have more sensible workflow.

Centralised vs. distributed edit

This discusses only our situation, there is plenty of general information on this matter elsewhere.

Pros Cons
Centralised
  • Allows to restrict write access to different parts of the repo, which we do for SVN (not really critical).
  • Has partial checkouts.
  • Easier to learn, already known by our current developers.
  • Slower
  • More restrictive on workflow
Distributed
  • Faster.
  • Less restrictions on workflow.
  • More convenient features (bisect, etc).
  • Would require splitting our repository to multiple pieces, which is less convenient.
  • And upgrading/replacing CodeReview.
  • No partial checkouts, or they are badly supported.
  • Harder to learn and, as some say, to use, but many people are familiar with it anyway.

Requirements edit

  1. Must make things easier, not nore complex.
  2. If we decide to split extensions to different repositories, there must be a sane way to include them all into /extensions and update automatically, just as we currently do it with symlinking.

Comparison edit

This section covers possible candidates along with our current VCS, Subversion. Because migration to a client-server VCS system seems completely unlikely, only distributed systems are taken into account. Same applies to less popular DVCSes. Sorry, Darcs and Monotone guys.

All of the current candidates are mature, have sufficient developer communities and are supported by enough free open-source projects hosters. See also w:en:Comparison of revision control software and Comparison of Bzr/Git/Hg.

Subversion edit

Subversion[1] was created in 1999 specifically as a replacement for CVS. This resulted in a project that although fixed many of CVS's flaws, didn't address its client-server nature.

Pros edit

  1. Most popular source control system.
  2. Simple.
  3. Lots of GUI clients.
    TortoiseSVN[2] especially rocks.
  4. Mature and stable API with bindings for virtually every programming language.

Cons edit

  1. Requires a network connection for most operations.
  2. Due to the above, slow in many cases. Even worse, can be über-slow for blames on files with long history.
  3. Branching and merging are horrid operations.
  4. Doesn't have some functions DVCS users are used to, such as bisect or stash.
  5. Greater load on server.
  6. Inconvenient .svn directories everywere.
  7. History is unreliable

Git edit

Git is a free distributed revision control, or software source code management project with an emphasis on being fast. Every Git working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server.

Pros edit

  1. Fast. Generally, the fastest DVCS around.
  2. Distributed, with all that comes along with that, like network-less blame operations, trustable history, etc.
  3. Surprisingly, has a good GUI client for Windows, TortoiseGit[3].
  4. Lots of features, though sometimes even too many of them, making learning harder.
  5. Can import SVN history into Git, fixing problems along the way (like one person with multiple committer names, splitting repositories, etc)

Cons edit

  1. Has problems with cross-platformness. Users can use either Cygwin, which is a horrible monster, or Msysgit which patches original Git sources. The latter has no git-daemon, and both have problems with localised filenames (not a problem for current MediaWiki codebase, but still). Performance on Windows is noticeably worse than on POSIX.
  2. Git-gui simply sucks. Those who disagree are advised to try using TortoiseSVN for a couple weeks, abstracting themselves from SVN deficiencies. There are, however, alternative GUIs.
  3. Has problems with huge code trees. The standard advise is to divide the codebase along boundaries where code is no longer shared (this should be done anyways), and use submodules as needed to glue the smaller repositories back together.

Mercurial edit

Mercurial is a cross-platform, distributed revision control tool for software developers, implemented primarily using the Python programming language, but includes a binary diff implementation written in C.

Mercurial's major design goals include high performance and scalability, decentralized, fully distributed collaborative development, robust handling of both plain text and binary files, and advanced branching and merging capabilities, while remaining conceptually simple.

Pros edit

  1. User friendly interface, basic functionality is easy to use. Easy to learn, good documentation.
  2. Repository design is safe: append-only guarantees history invulnerability; compressed repository storage is efficient
  3. Fast (git is faster by half an angström)
  4. Permissions are good customisable
  5. Easy to extend with python knowledge
  6. Built-in repository conversion from most SCMs out there

Cons edit

  1. TortoiseHG is pretty fugly if compared to fully native TortoiseSVN and TortoiseGit.
  2. No native gui (but many general GUIs support it, and most people are fine with command line)

Bazaar edit

Bazaar is a distributed revision control system sponsored by Canonical Ltd., designed to make it easier for anyone to contribute to free and open source software projects.

The development team's focus is on ease of use, accuracy and flexibility, with a particular focus on branching and merging[citation needed]. Bazaar can be used by a single developer working on multiple branches of local content, or by teams collaborating across a network.

Bazaar is written in the Python programming language, with packages for major GNU/Linux distributions, Mac OS X and MS Windows.

(Emacs recently switched to BZR from CVS and has provided a work-flow suggestions on the emacswiki.)

Pros edit

  1. Allows mixing with SVN: comitting to a bzr master is synchronized to SVN and vice versa.
  2. Told to have great merging capabilities
  3. Easy to learn for people coming from SVN. Lots of similarities in commands.

Cons edit

  1. Widely considered the slowest DVCS around, though its developers advertise huge improvements in this field.
  2. Smaller userbase ("penetration")