Jenkins is a Java tool used to handle recurring tasks such as running tests or building packages. Our primary install is at https://integration.wikimedia.org/ci/.
The tool is permanently connected to our review tool (Gerrit) and can be made to react on changes submitted to Gerrit. A typical example, is running MediaWiki unit tests whenever a change is submitted to the
The main repository for the layout of Jenkins itself is
integration/jenkins.git (installed in
You can add new Jenkins agents.
These services run alongside on the Jenkins server as a result of certain jobs that publish results outside the Jenkins realm for other uses:
They are being generated at 3am UTC via the Jenkins job "Nightly - MediaWiki core". The ant target is
nightly-mediawiki-core and is really straightforward : it get the latest version of master and uses
git-archive to generate a zip file. It is then copied under
/org/mediawiki/integration/nightly hierarchy which is publicly available via https://integration.wikimedia.org/nightly/.
The latest snapshot is always available at https://integration.wikimedia.org/nightly/mediawiki/core/mediawiki-latest.zip (it is a symbolic link to the latest snapshot).
Wikipedia Android alpha app buildsEdit
The mobile apps team uses Jenkins to build its alpha release of the Wikipedia app, available at http://android-builds.wmflabs.org/. Each time a change to /apps/android/wikipedia is merged in Gerrit, an apps-android-wikipedia-publish job is triggered in Jenkins. Upon successful completion, the resulting APK and a JSON file containing the build timestamp are preserved as build outputs.
Every 15 minutes, a script invoked by a cron job running on the
android-builder instance in Wikimedia Cloud VPS checks for a new APK in the job outputs, and publishes it at the above web site if found.
For more info, see Wikimedia_Apps/Team/Android/App_hacking/Alpha_build_server.
This article is outdated.
To make the install go faster, it helps to have a mediawiki-core checkout in ~/src/mediawiki-core - if this repository exists, it will make a local clone. If it doesn't, it will download from gerrit instead (slow!).
git clone https://gerrit.wikimedia.org/r/integration/jenkins.git ~/.jenkins
~/.jenkinsis the default jenkins configuration directory
- WM-specific configuration patch I -
ln -s $HOME/.jenkins /var/lib/jenkins
- because some jobs assume jenkins is installed in
- because some jobs assume jenkins is installed in
- Download jenkins and place it in ~/.jenkins
- install the following plugins (download into
- Download jenkins-job-builder and its configuration ‒ see Continuous_integration/Jenkins job builder for more information. You don't need a password when you install Jenkins locally.
- WM-specific configuration patch II - Patch the JBB configuration that depends on Zuul (see the mkjenkins script for a diff)
- If you already have a checkout of mediawiki-core:
git clone --mirror -l -- your_existing_checkout /var/lib/jenkins/git/mw-core-bare. Otherwise,
git clone --mirror -- https://gerrit.wikimedia.org/r/mediawiki/core.git /var/lib/jenkins/git/mw-core-bare
- Start Jenkins:
cd ~/.jenkins && java -jar jenkins.war&
- When Jenkins is running, install the JBB jobs:
rm -f $HOME/.cache/jenkins_jobs/jenkins_jobs_cache.yml && jenkins-jobs --conf jenkins_jobs.ini update config/.
Hung beta code/db updateEdit
This deadlock seems to happen more often than not following or during a database update that is taking a while to complete.
- Take deployment-deploy01 offline in Jenkins https://integration.wikimedia.org/ci/computer/deployment-deploy01/markOffline
- Kill any jenkins jobs running on deployment-deploy01 via Jenkins UI
- Kill all pending jobs in the Jenkins queue that are "waiting on executors"
- Disconnect deployment-deploy01 https://integration.wikimedia.org/ci/computer/deployment-deploy01/disconnect
- Bring deployment-deploy01 back online (button labeled "Bring this node back online")
- Launch slave agent (there's a button that says this)
- Check agent log to see that it connected https://integration.wikimedia.org/ci/computer/deployment-deploy01/log
Sometimes you have to do this whole dance several times before Jenkins realizes that the there are a bunch of executors that it can use.
- Go to https://integration.wikimedia.org/ci/manage
- Go to "Configure System"
- Search page for "Enable Gearman"
- Un-check the checkbox
- Wait 30s
- Check the "Enable Gearman" checkbox
This second method may interrupt communication between running Jenkins jobs and Zuul but it seems to work even when the offline/online method fails to clear the deadlock.
Alternate alternate method:
- Login to https://integration.wikimedia.org/ci
- Hit the red
[x]to cancel one pending
- That's it!
It seems that this is some conflict between the Jenkins native scheduler and the Gearman scheduler. Cancelling the build seems to fix the problem. Subsequent builds will deploy the same code to production.
Zuul should not be restarted. Zuul preserves the queue and continues after the restart.
Via web interface
Apply the self-serve Jenkins repair!
safeRestart any currently running jobs will block a restart until they are canceled. Any long running jobs should be killed. Check for jobs on the main jenkins dashboard, cancel any long-running jobs there. Bonus points: make a note of the patches for which you have canceled jobs on the zuul dashboard, comment "recheck" for any patches in the test queue that you have aborted.
- Head to https://integration.wikimedia.org/ci/safeRestart
- Login with your labs account being part of the 'wmf' LDAP group
- press "Yes"
- in connect: "!log restarting stuck Jenkins".
On contint1001.wikimedia.org either the legacy:
sudo /etc/init.d/jenkins restart
Or the new:
sudo service jenkins restart
Whenever Jenkins appears to be stuck or facing high CPU usage, you will want to look at the Java threads: https://integration.wikimedia.org/ci/threadDump
This is the way to do it from the CLI
jstack -l -F <pid of jenkins>
Last time this happened (2017-05-20) a restart of Jenkins "fixed" the problem, but we were unable to troubleshoot without a stacktrace from
Sometimes, changes in other repositories may cause your builds to fail. You can check the Shared Build Failure board to see if any existing issues are similar to your build failure; if there aren’t any, and you’re reasonably certain that your build failures are unrelated to your own changes, you can create a new task.
Agent remote call failedEdit
11:53:37 FATAL: Remote call on integration-agent-docker-1001 failed are caused with problems in the java agent process running on each agent machine.
To fix these errors try restarting the agent on the target machine.
- Take agent offline in Jenkins at https://integration.wikimedia.org/ci/computer/AGENTNAME/markOffline
- Disconnect the agent: https://integration.wikimedia.org/ci/computer/AGENTNAME/disconnect
- ssh into the agent and kill the java
thcipriani@integration-agent-docker-1001:~$ ps aux | grep -i jav[a] jenkins+ 31931 0.9 1.9 12195832 483676 ? Ssl Feb19 158:55 java -jar slave.jar thcipriani@integration-agent-docker-1001:~$ sudo kill -9 31931
- Bring node back in Jenkins web ui: https://integration.wikimedia.org/ci/computer/AGENTNAME/toggleOffline
- Relaunch the agent on the machine via Jenkins web ui: https://integration.wikimedia.org/ci/computer/AGENTNAME/launchSlaveAgent
- Ensure the agent has launched on the agent itself, i.e., ensure that there is a new PID for the
thcipriani@integration-agent-docker-1001:~$ ps aux | grep -i jav[a] jenkins+ 10618 27.8 0.5 10419524 141168 ? Ssl 16:43 0:05 java -jar slave.jar
Start Jenkins with Java option:
Text thread dump: https://integration.wikimedia.org/ci/monitoring?part=threadsDump