Wikipedia.org Portal/Migration to gerrit

Technical issue: Shifting the portal code to gerrit edit

NOTE: This section is a DRAFT

Although the existing system (storing the portal content in a template on meta) has worked until now, it cannot effectively support the workflows that will be necessary to make the desired improvements. Basically, the portal itself needs to be treated more like a piece of code, and less like static content.

Working closely with the long-time portal maintainer (Minh Nguyễn), the Discovery team came up with a plan to shift the portal code and content into a git (gerrit) repository. This would allow multiple developers to work on the portal without interfering with each other, using tools and practices already in use on mediawiki and other wikimedia projects.

Benefits of shifting to gerrit edit

Easier manipulation of files. The portal consists of multiple files, including HTML, Javascript, and CSS. In the current system, each file is a separate wiki page, so to commit a change that involves multiple files, you would have to separately edit and save each page, and each save would be treated as a separate commit. With Git, all the changes to all the affected files would be stored as a single, atomic commit, which could easily be reviewed, merged, or rolled back.

Also, it would be useful to split some of the existing files into multiple files. Git would handle that easily, whereas in the current system, it would make the commit situation even more cumbersome and risky.

Allows development on multiple branches. With the existing system, experimental branches would have to be done on separate copies of the template pages. Merging work in either direction, between the mainline and a branch, or between two branches, would be extremely painful. Git has excellent branching features, which are widely known and used.

Allows the use of standard and modern software development tools. The existing meta template system leverages the easy editing of a wiki, but requires a special back-end pipeline ("extract2.php") that converts the meta templates into files that can be served. Moving to git would allow a more conventional deployment system that would copy files from a repository onto a server.

Development would also become easier, because there are so many tools to deal with locally-stored files, including code formatters and style validators, post-processors, previewers, and debuggers. Gerrit commits can automatically be run through a test suite, making it harder for bugs to get through. Most developers are familiar with git, and gerrit is the standard version control system for Mediawiki-related software.

Easier to replace manually-updated values with code snippets to update them automatically. Although this is NOT a part of the Discovery portal improvement initiative, this would be a nice side benefit. The portal maintainer believes that shifting to gerrit would make it easier for him (or others) to automate some of the content on the page.

Objections to shifting to gerrit edit

Limits on who will be allowed to contribute. Commit rights will be granted liberally. The goal is not to limit who can work on the page, so every effort will be made to allow interested people to contribute.

Higher barrier for user contributions. It is true that committing via gerrit is substantially more difficult than simply editing a template on meta. However, these pages don't receive a lot of external contributions, so the actual effect shouldn't be large.

Changes won't show up on the site as quickly. The current mechanism allows edits to the page to be viewed in production almost immediately. Initially, with the new mechanism, commits will not appear in production unless/until they are manually deployed. That could be done within minutes of a change being merged, or could be scheduled at regular times, or could be delayed if there were some reason to hold back the changes temporarily.

A new deployment tool (scap3) is being developed, which might eventually allow changes to be deployed automatically. In that case, it might deploy new portal code hourly, or at whatever interval makes sense. Release Engineering does not want automatic deployment implemented without scap3.

"Gerrit has less monitoring and integration with Meta-Wiki". The Discovery team, and the long-time portal maintainer, will both be monitoring commits in gerrit. While the current integration with the meta-wiki community is helpful, the trade-off is that the current system is less integrated with the engineering community.

Why not add functionality through the existing template system first? The Discovery team is responsible for achieving a quarterly goal of starting to improve the wikipedia.org portal in measurable ways. Two developers are ready to add event logging, an A/B test framework, and actual UI changes to the page. All of that work would be easier (and safer) in git, so it makes sense to switch now.