Continuous integration/Jenkins

< Continuous integration(Redirected from Jenkins)
Jenkins logo with title.svg

Jenkins is a Java tool used to handle recurring tasks such as running tests or building packages. Our primary install is at https://integration.wikimedia.org/ci/.

The tool is permanently connected to our review tool (Gerrit) and can be made to react on changes submitted to Gerrit. A typical example, is running MediaWiki unit tests whenever a change is submitted to the mediawiki/core.git repository.

The configuration of individual jobs is abstracted via Jenkins job builder. The jobs are triggered with Zuul (which processes Gerrit events), and configured in the integration/config.git repository.

Most of the infrastructure is detailed on wikitech:

Local InstallationEdit

Issue?Edit

Hung beta code/db updateEdit

This deadlock seems to happen more often than not following or during a database update that is taking a while to complete.

Sometimes you have to do this whole dance several times before Jenkins realizes that the there are a bunch of executors that it can use.

Alternate method:

This second method may interrupt communication between running Jenkins jobs and Zuul but it seems to work even when the offline/online method fails to clear the deadlock.

Alternate alternate method:

It seems that this is some conflict between the Jenkins native scheduler and the Gearman scheduler. Cancelling the build seems to fix the problem. Subsequent builds will deploy the same code to production.

RestartEdit

Zuul should not be restarted. Zuul preserves the queue and continues after the restart.

Via web interface

Apply the self-serve Jenkins repair!

With a safeRestart any currently running jobs will block a restart until they are canceled. Any long running jobs should be killed. Check for jobs on the main jenkins dashboard, cancel any long-running jobs there. Bonus points: make a note of the patches for which you have canceled jobs on the zuul dashboard, comment "recheck" for any patches in the test queue that you have aborted.

  1. Head to https://integration.wikimedia.org/ci/safeRestart
  2. Login with your labs account being part of the 'wmf' LDAP group
  3. press "Yes"
  4. in #wikimedia-operations connect: "!log restarting stuck Jenkins".

Shell

On contint1001.wikimedia.org either the legacy:

sudo /etc/init.d/jenkins restart

Or the new:

sudo service jenkins restart

OOM IssuesEdit

Troubleshooting

Whenever Jenkins appears to be stuck or facing high CPU usage, you will want to look at the Java threads: https://integration.wikimedia.org/ci/threadDump

This is the way to do it from the CLI

   jstack -l -F <pid of jenkins>

Last time this happened (2017-05-20) a restart of Jenkins "fixed" the problem, but we were unable to troubleshoot without a stacktrace from jstack

Build failures look unrelatedEdit

Sometimes, changes in other repositories may cause your builds to fail. You can check the Shared Build Failure board to see if any existing issues are similar to your build failure; if there aren’t any, and you’re reasonably certain that your build failures are unrelated to your own changes, you can create a new task.

Agent remote call failedEdit

Errors like 11:53:37 FATAL: Remote call on integration-agent-docker-1001 failed are caused with problems in the java agent process running on each agent machine.

To fix these errors try restarting the agent on the target machine.

DebuggingEdit

Start Jenkins with Java option:

-Dhudson.plugins.git.GitSCM.verbose="true"


Text thread dump: https://integration.wikimedia.org/ci/monitoring?part=threadsDump

See alsoEdit