L.2. Connecting Jenkins to the containers

Prev

To properly cooperate with the jenkins-dynamatrix project driving regular NUT CI builds, each build environment should be exposed as an individual agent with labels describing its capabilities.

Agent Labels

With the jenkins-dynamatrix, agent labels are used to calculate a large "slow build" matrix to cover numerous scenarios for what can be tested with the current population of the CI farm, across operating systems, make, shell and compiler implementations and versions, and C/C++ language revisions, to name a few common "axes" involved.

Labels for QEMU

Emulated-CPU container builds are CPU-intensive, so for them we define as few capabilities as possible: here CI is more interested in checking how binaries behave on those CPUs, not in checking the quality of recipes (distcheck, Make implementations, etc.), shell scripts or documentation, which is more efficient to test on native platforms.

Still, we are interested in results from different compiler suites, so specify at least one version of each.

Note

Currently the NUT Jenkinsfile-dynamatrix only looks at various COMPILER variants for qemu-nut-builder use-cases, disregarding the versions and just using one that the environment defaults to.

The reduced set of labels for QEMU workers looks like:

qemu-nut-builder qemu-nut-builder:alldrv
NUT_BUILD_CAPS=drivers:all NUT_BUILD_CAPS=cppunit
OS_FAMILY=linux OS_DISTRO=debian11 GCCVER=10 CLANGVER=11
COMPILER=GCC COMPILER=CLANG
ARCH64=ppc64le ARCH_BITS=64

Labels for native builds

For contrast, a "real" build agent’s set of labels, depending on presence or known lack of some capabilities, looks something like this:

doc-builder nut-builder nut-builder:alldrv
NUT_BUILD_CAPS=docs:man NUT_BUILD_CAPS=docs:all
NUT_BUILD_CAPS=drivers:all NUT_BUILD_CAPS=cppunit=no
OS_FAMILY=bsd OS_DISTRO=freebsd12 GCCVER=10 CLANGVER=10
COMPILER=GCC COMPILER=CLANG
ARCH64=amd64 ARCH_BITS=64
SHELL_PROGS=sh SHELL_PROGS=dash SHELL_PROGS=zsh SHELL_PROGS=bash
SHELL_PROGS=csh SHELL_PROGS=tcsh SHELL_PROGS=busybox
MAKE=make MAKE=gmake
PYTHON=python2.7 PYTHON=python3.8

Generic agent attributes

Name: e.g. ci-debian-altroot--jenkins-debian10-arm64 (note the pattern for "Conflicts With" detailed below)
Remote root directory: preferably unique per agent, to avoid surprises; e.g.: /home/abuild/jenkins-nut-altroots/jenkins-debian10-armel
- Note it may help that the system home directory itself is shared between co-located containers, so that the .ccache or .gitcache-dynamatrix are available to all builders with identical contents
- If RAM permits, the Jenkins Agent working directory may be placed in a temporary filesystem not backed by disk (e.g. /dev/shm on modern Linux distributions); roughly estimate 300Mb per executor for NUT builds.
Usage: "Only build jobs with label expressions matching this node"
Node properties / Environment variables:
- PATH+LOCAL ⇒ /usr/lib/ccache

Where to run agent.jar

Depending on circumstances of the container, there are several options available to the NUT CI farm:

Java can run in the container, efficiently (native CPU, different distro) ⇒ the container may be exposed as a standalone host for direct SSH access (usually by NAT, exposing SSH on a dedicated port of the host; or by first connecting the Jenkins controller with the host as an SSH Build Agent, and then calling SSH to the container as a prefix for running the agent; or by using Jenkins Swarm agents), so ultimately the build agent.jar JVM would run in the container. Filesystem for the abuild account may be or not be shared with the host.
Java can not run in the container (crashes on emulated CPU, or is too old in the agent container’s distro — currently Jenkins requires JRE 17+, but eventually will require 21+) ⇒ the agent would run on the host, and then the host would ssh or chroot (networking not required, but bind-mount of /home/abuild and maybe other paths from host would be needed) called for executing sh steps in the container environment. Either way, home directory of the abuild account is maintained on the host and shared with the guest environment, user and group IDs should match.
Java is inefficient in the container (operations like un-stashing the source succeed but take minutes instead of seconds) ⇒ either of the above

Note

As time moves on and Jenkins core and its plugins get updated, support for some older run-time features of the build agents can get removed (e.g. older Java releases, older Git tooling). While there are projects like Temurin that provide Java builds for older systems, at some point a switch to "Jenkins agent on new host going into older build container" approach can become unavoidable. One clue to look at in build logs is failure messages like:

Caused by: java.lang.UnsupportedClassVersionError:
  hudson/slaves/SlaveComputer$SlaveVersion has been compiled by a more
  recent version of the Java Runtime (class file version 61.0), this version
  of the Java Runtime only recognizes class file versions up to 55.0

Using Jenkins SSH Build Agents

This is a typical use-case for tightly integrated build farms under common management, where the Jenkins controller can log by SSH into systems which act as its build agents. It injects and launches the agent.jar to execute child processes for the builds, and maintains a tunnel to communicate.

Methods below involving SSH assume that you have configured a password-less key authentication from the host machine to the abuild account in each guest build environment container. This can be an ssh-keygen result posted into authorized_keys, or a trusted key passed by a chain of ssh agents from a Jenkins Credential for connection to the container-hoster into the container. The private SSH key involved may be secured by a pass-phrase, as long as your Jenkins Credential storage knows it too. Note that for the approaches explored below, the containers are not directly exposed for log-in from any external network.

For passing the agent through an SSH connection from host to container, so that the agent.jar runs inside the container environment, configure:
- Launch method: "Agents via SSH"
- Host, Credentials, Port: as suitable for accessing the container-hoster
  Note
  The container-hoster should have accessed the guest container from the account used for intermediate access, e.g. abuild, so that its .ssh/known_hosts file would trust the SSH server on the container.
- Prefix Start Agent Command: content depends on the container name, but generally looks like the example below to report some info about the final target platform (and make sure java is usable) in the agent’s log. Note that it ends with un-closed quote and a space char:
```
ssh jenkins-debian10-amd64 '( java -version & uname -a ; getconf LONG_BIT; getconf WORD_BIT; wait ) &&
```
- Suffix Start Agent Command: a single quote to close the text opened above:

The other option is to run the agent.jar on the host, for all the network and filesystem magic the agent does, and only execute shell steps in the container. The solution relies on overridden sh step implementation in the jenkins-dynamatrix shared library that uses a magic CI_WRAP_SH environment variable to execute a pipe into the container. Such pipes can be ssh or chroot with appropriate host setup described above.
Note
In case of ssh piping, remember that the container’s /etc/ssh/sshd_config should AcceptEnv * and the SSH server should be restarted after such configuration change.
- Launch method: "Agents via SSH"
- Host, Credentials, Port: as suitable for accessing the container-hoster
- Prefix Start Agent Command: content depends on the container name, but generally looks like the example below to report some info about the final target platform (and make sure it is accessible) in the agent’s log. Note that it ends with a space char, and that the command here should not normally print anything into stderr/stdout (this tends to confuse the Jenkins Remoting protocol):
```
echo PING > /dev/tcp/jenkins-debian11-ppc64el/22 &&
```
- Suffix Start Agent Command: empty

Node properties / Environment variables:

CI_WRAP_SH ⇒

ssh -o SendEnv='*' "jenkins-debian11-ppc64el" /bin/sh -xe

Using Jenkins Swarm Agents

This approach allows remote systems to participate in the NUT CI farm by dialing in and so defining an agent. A single contributing system may be running a number of containers or virtual machines set up following the instructions above, and each of those would be a separate build agent.

Such systems should be "dedicated" to contribution in the sense that they should be up and connected for days, and sometimes tasks would land.

Configuration files maintained on the Swarm Agent system dictate which labels or how many executors it would expose, etc. Credentials to access the NUT CI farm Jenkins controller to register as an agent should be arranged with the farm maintainers, and currently involve a GitHub account with Jenkins role assignment for such access, and a token for authentication.

The jenkins-swarm-nutci repository contains example code from such setup with a back-up server experiment for the NUT CI farm, including auto-start method scripts for Linux systemd and upstart, illumos SMF, and OpenBSD rcctl.

Sequentializing the stress

Running one agent at a time

Another aspect of farm management is that emulation is a slow and intensive operation, so we can not run all agents and execute builds at the same time.

The current solution relies on https://github.com/jimklimov/conflict-aware-ondemand-retention-strategy-plugin to allow co-located build agents to "conflict" with each other — when one picks up a job from the queue, it blocks neighbors from starting; when it is done, another may start.

Containers can be configured with "Availability ⇒ On demand", with shorter cycle to switch over faster (the core code sleeps a minute between attempts):

In demand delay: 0;
Idle delay: 0 (Jenkins may change it to 1);
Conflicts with: ^ci-debian-altroot--.*$ assuming that is the pattern for agent definitions in Jenkins — not necessarily linked to hostnames.

Also, the "executors" count should be reduced to the amount of compilers in that system (usually 2) and so avoid extra stress of scheduling too many emulated-CPU builds at once.

Sequentializing the git cache access

As part of the jenkins-dynamatrix optional optimizations, the NUT CI recipe invoked via Jenkinsfile-dynamatrix maintains persistent git reference repositories that can be used to cache NUT codebase (including the tested commits) and so considerably speed up workspace preparation when running numerous build scenarios on the same agent.

Such .gitcache-dynamatrix cache directories are located in the build workspace location (unique for each agent), but on a system with numerous containers these names can be symlinks pointing to a shared location.

To avoid collisions with several executors updating the same cache with new commits, critical access windows are sequentialized with the use of Lockable Resources plugin. On the jenkins-dynamatrix side this is facilitated by labels:

DYNAMATRIX_UNSTASH_PREFERENCE=scm-ws:nut-ci-src
DYNAMATRIX_REFREPO_WORKSPACE_LOCKNAME=gitcache-dynamatrix:SHARED_HYPERVISOR_NAME

The DYNAMATRIX_UNSTASH_PREFERENCE tells the jenkins-dynamatrix library code which checkout/unstash strategy to use on a particular build agent (following values defined in the library; scm-ws means SCM caching under the agent workspace location, nut-ci-src names the cache for this project);
The DYNAMATRIX_REFREPO_WORKSPACE_LOCKNAME specifies a semi-unique string: it should be same for all co-located agents which use the same shared cache location, e.g. guests on the same hypervisor; and it should be different for unrelated cache locations, e.g. different hypervisors and stand-alone machines.