This version of the page reflects NUT release v2.8.0-rc1 with codebase commited 86af0b98c at 2022-04-01T02:02:27+02:00
Options, features and capabilities in current development (and future releases) are detailed on the main site and may differ from ones described here.
Due to some historical reasons including earlier personal experience, the Linux container setup implemented as described below was done with persistent LXC containers wrapped by LIBVIRT for management. There was no particular use-case for systems like Docker (and no firepower for a Kubernetes cluster) in that the build environment intended for testing non-regression against a certain release does not need to be regularly updated — its purpose is to be stale and represent what users still running that system for whatever reason (e.g. embedded, IoT, corporate) have in their environments.
Prepare LXC and LIBVIRT-LXC integration, including an "independent" (aka "masqueraded) bridge for NAT, following https://wiki.debian.org/LXC and https://wiki.debian.org/LXC/SimpleBridge
For dnsmasq integration on the independent bridge (lxcbr0
following
the documentation examples), be sure to mention:
LXC_DHCP_CONFILE="/etc/lxc/dnsmasq.conf"
in /etc/default/lxc-net
dhcp-hostsfile=/etc/lxc/dnsmasq-hosts.conf
in/as the content of
/etc/lxc/dnsmasq.conf
touch /etc/lxc/dnsmasq-hosts.conf
which would list simple name,IP
pairs, one per line (so one per container)
systemctl restart lxc-net
to apply config (is this needed after
setup of containers too, to apply new items before booting them?)
/usr/bin/qemu-*-static
and registration in
/var/lib/binfmt
/srv/libvirt
and create a /srv/libvirt/rootfs
to hold the containers
Prepare /home/abuild
on the host system (preferably in ZFS with dedup);
account user and group ID numbers are 399
as on the rest of the CI farm
(historically, inherited from OBS workers)
It may help to generate an ssh key without a passphrase for abuild
that it would trust, to sub-login from CI agent sessions into the
container. Then again, it may be not required if CI logs into the
host by SSH using authorized_keys and an SSH Agent, and the inner
ssh client would forward that auth channel to the original agent.
abuild$ ssh-keygen # accept defaults abuild$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys abuild$ chmod 640 ~/.ssh/authorized_keys
Edit the root (or whoever manages libvirt) ~/.profile
to default the
virsh provider with:
LIBVIRT_DEFAULT_URI=lxc:///system export LIBVIRT_DEFAULT_URI
If host root filesystem is small, relocate the LXC download cache to the
(larger) /srv/libvirt
partition:
:; mkdir -p /srv/libvirt/cache-lxc :; rm -rf /var/cache/lxc :; ln -sfr /srv/libvirt/cache-lxc /var/cache/lxc
/home/abuild
to reduce strain on rootfs?
Note that completeness of qemu CPU emulation varies, so not all distros can be installed, e.g. "s390x" failed for both debian10 and debian11 to set up the openssh-server package, or once even to run /bin/true (seems to have installed an older release though, to match the outdated emulation?)
While the lxc-create
tool does not really specify the error cause and
deletes the directories after failure, it shows the pathname where it
writes the log (also deleted). Before re-trying the container creation, this
file can be watched with e.g. tail -F /var/cache/lxc/.../debootstrap.log
You can find the list of LXC "template" definitions on your system
by looking at the contents of the /usr/share/lxc/templates/
directory,
e.g. a script named lxc-debian
for the "debian" template. You can see
further options for each "template" by invoking its help action, e.g.:
+
:; lxc-create -t debian -h
Install containers like this:
:; lxc-create -n jenkins-debian11-mips64el -P /srv/libvirt/rootfs -t debian -- \ -r bullseye -a mips64el
to specify a particular mirror (not everyone hosts everything — so if you get something like
"E: Invalid Release file, no entry for main/binary-mips/Packages"
then see https://www.debian.org/mirror/list for details, and double-check
the chosen site to verify if the distro version of choice is hosted with
your arch of choice):
:; MIRROR="http://ftp.br.debian.org/debian/" \ lxc-create -n jenkins-debian10-mips -P /srv/libvirt/rootfs -t debian -- \ -r buster -a mips
…or for EOLed distros, use the archive, e.g.:
:; MIRROR="http://archive.debian.org/debian-archive/debian/" \ lxc-create -n jenkins-debian8-s390x -P /srv/libvirt/rootfs -t debian -- \ -r jessie -a s390x
…Alternatively, other distributions can be used (as supported by your
LXC scripts, typically in /usr/share/debootstrap/scripts
), e.g. Ubuntu:
:; lxc-create -n jenkins-ubuntu1804-s390x -P /srv/libvirt/rootfs -t ubuntu -- \ -r bionic -a s390x
For distributions with a different packaging mechanism from that on the
LXC host system, you may need to install corresponding tools (e.g. yum4
,
rpm
and dnf
on Debian hosts for installing CentOS and related guests).
You may also need to pepper with symlinks to taste (e.g. yum => yum4
),
or find a pacman
build to install Arch Linux or derivative, etc.
Otherwise, you risk seeing something like this:
root@debian:~# lxc-create -n jenkins-centos7-x86-64 -P /srv/libvirt/rootfs \ -t centos -- -R 7 -a x86_64 Host CPE ID from /etc/os-release: 'yum' command is missing lxc-create: jenkins-centos7-x86-64: lxccontainer.c: create_run_template: 1616 Failed to create container from template lxc-create: jenkins-centos7-x86-64: tools/lxc_create.c: main: 319 Failed to create container jenkins-centos7-x86-64
Note also that with such "third-party" distributions you may face other issues; for example, the CentOS helper did not generate some fields in the `config` file that were needed for conversion into libvirt "domxml" (as found by trial and error, and comparison to other `config` files):
lxc.uts.name = jenkins-centos7-x86-64 lxc.arch = x86_64
Also note the container/system naming without underscore in "x86_64" -- the deployed system discards the character when assigning its hostname. Using "amd64" is another reasonable choice here.
Add the "name,IP" line for this container to /etc/lxc/dnsmasq-hosts.conf
on the host, e.g.:
jenkins-debian11-mips,10.0.3.245
Don’t forget to eventually systemctl restart lxc-net
to apply the
new host reservation!
Convert a pure LXC container to be managed by LIBVIRT-LXC (and edit config
markup on the fly — e.g. fix the LXC dir:/
URL schema):
:; virsh -c lxc:///system domxml-from-native lxc-tools \ /srv/libvirt/rootfs/jenkins-debian11-armhf/config \ | sed -e 's,dir:/srv,/srv,' \ > /tmp/x && virsh define /tmp/x
You may want to tune the default generic 64MB RAM allocation, so your
launched QEMU containers are not OOM-killed as they exceeded their memory
cgroup limit. In practice they do not eat that much resident memory, just
want to have it addressable by VMM, I guess (swap is not very used either),
at least not until active builds start (then it depends on make
program
parallelism level you allow, e.g. by MAXPARMAKES
envvar for ci_build.sh
,
and on the number of Jenkins "executors" assigned to the build agent).
It may be needed to revert the generated "os/arch" to x86_64
(and let
QEMU handle the rest) in the /tmp/x
file, and re-try the definition:
:; virsh define /tmp/x
Then virsh edit jenkins-debian11-armhf
(and other containers) to bind-mount
the common /home/abuild
, adding this tag to their "devices":
<filesystem type='mount' accessmode='passthrough'> <source dir='/home/abuild'/> <target dir='/home/abuild'/> </filesystem>
Monitor deployed container rootfs’es with:
:; du -ks /srv/libvirt/rootfs/*
(should have non-trivial size for deployments without fatal infant errors)
Mass-edit/review libvirt configurations with:
:; virsh list --all | awk '{print $2}' \ | grep jenkins | while read X ; do \ virsh edit --skip-validate $X ; done
--skip-validate
when markup is initially good :)
Mass-define network interfaces:
:; virsh list --all | awk '{print $2}' \ | grep jenkins | while read X ; do \ virsh dumpxml "$X" | grep "bridge='lxcbr0'" \ || virsh attach-interface --domain "$X" --config \ --type bridge --source lxcbr0 ; \ done
Verify that unique MAC addresses were defined (e.g. 00:16:3e:00:00:01
tends to pop up often, while 52:54:00:xx:xx:xx
are assigned to other
containers); edit the domain definitions to randomize, if needed:
:; grep 'mac add' /etc/libvirt/lxc/*.xml | awk '{print $NF" "$1}' | sort
Make sure at least one console device exists (end of file, under the network interface definition tags), e.g.:
<console type='pty'> <target type='lxc' port='0'/> </console>
Populate with abuild
account, as well as with the bash
shell and
sudo
ability, reporting of assigned IP addresses on the console,
and SSH server access complete with envvar passing from CI clients
by virtue of ssh -o SendEnv='*' container-name
:
:; for ALTROOT in /srv/libvirt/rootfs/*/rootfs/ ; do \ echo "=== $ALTROOT :" >&2; \ grep eth0 "$ALTROOT/etc/issue" || ( echo '\S{NAME} \S{VERSION_ID} \n \l@\b ; Current IP(s): \4{eth0} \4{eth1} \4{eth2} \4{eth3}' >> "$ALTROOT/etc/issue" ) ; \ grep eth0 "$ALTROOT/etc/issue.net" || ( echo '\S{NAME} \S{VERSION_ID} \n \l@\b ; Current IP(s): \4{eth0} \4{eth1} \4{eth2} \4{eth3}' >> "$ALTROOT/etc/issue.net" ) ; \ groupadd -R "$ALTROOT" -g 399 abuild ; \ useradd -R "$ALTROOT" -u 399 -g abuild -M -N -s /bin/bash abuild \ || useradd -R "$ALTROOT" -u 399 -g 399 -M -N -s /bin/bash abuild \ || { if ! grep -w abuild "$ALTROOT/etc/passwd" ; then \ echo 'abuild:x:399:399::/home/abuild:/bin/bash' \ >> "$ALTROOT/etc/passwd" ; \ echo "USERADDed manually: passwd" >&2 ; \ fi ; \ if ! grep -w abuild "$ALTROOT/etc/shadow" ; then \ echo 'abuild:!:18889:0:99999:7:::' >> "$ALTROOT/etc/shadow" ; \ echo "USERADDed manually: shadow" >&2 ; \ fi ; \ } ; \ if [ -s "$ALTROOT/etc/ssh/sshd_config" ]; then \ grep 'AcceptEnv \*' "$ALTROOT/etc/ssh/sshd_config" || ( \ ( echo ""; echo "# For CI: Allow passing any envvars:"; echo 'AcceptEnv *' ) \ >> "$ALTROOT/etc/ssh/sshd_config" \ ) ; \ fi ; \ done
Note that for some reason, in some of those other-arch distros useradd
fails to find the group anyway; then we have to "manually" add them.
Let the host know names/IPs of containers you assigned:
:; grep -v '#' /etc/lxc/dnsmasq-hosts.conf | while IFS=, read N I ; do \ getent hosts "$N" >&2 || echo "$I $N" ; \ done >> /etc/hosts
See NUT docs/config-prereqs.txt
about dependency package installation
for Debian-based Linux systems.
It may be wise to not install e.g. documentation generation tools (or at least not the full set for HTML/PDF generation) in each environment, in order to conserve space and run-time stress.
Still, if there are significant version outliers (such as using an older distribution due to vCPU requirements), it can be installed fully just to ensure non-regression — that when adapting Makefile rule definitions to modern tools, we do not lose ability to build with older ones.
For this, chroot
from the host system can be used, e.g. to improve the
interactive usability for a population of Debian(-compatible) containers
(and to use its networking, while the operating environment in containers
may be not yet configured or still struggling to access the Internet):
:; for ALTROOT in /srv/libvirt/rootfs/*/rootfs/ ; do \ echo "=== $ALTROOT :" ; \ chroot "$ALTROOT" apt-get install \ sudo bash vim mc p7zip p7zip-full pigz pbzip2 git \ ; done
Similarly for yum
-managed systems (CentOS and relatives), though specific
package names can differ, and additional package repositories may need to
be enabled first (see link:config-prereqs.txt for details).
Note that technically (sudo) chroot ...
can also be used from the CI worker
account on the host system to build in the prepared filesystems without the
overhead of running containers and several copies of Jenkins agent.jar
.
Also note that set-up of some packages, including the ca-certificates
and
the JDK/JRE, require that the /proc
filesystem is usable in the chroot.
This can be achieved with e.g.:
:; for ALTROOT in /srv/libvirt/rootfs/*/rootfs/ ; do \ for D in proc ; do \ echo "=== $ALTROOT/$D :" ; \ mkdir -p "$ALTROOT/$D" ; \ mount -o bind,rw "/$D" "$ALTROOT/$D" ; \ done ; \ done
TODO: Test and document a working NAT and firewall setup for this, to allow SSH access to the containers via dedicated TCP ports exposed on the host.
Q: Container won’t start, its virsh console
says something like:
+
Failed to create symlink /sys/fs/cgroup/net_cls: Operation not permitted
A: According to https://bugzilla.redhat.com/show_bug.cgi?id=1770763 (skip to the end for summary) this can happen when a newer Linux host system with cgroupsv2 runs an older guest distro that only knows about cgroupsv1, such as CentOS 7. One workaround is to ensure that the guest systemd does not try to join host facilities, by setting an explicit empty list:
+
:; echo 'JoinControllers=' >> "$ALTROOT/etc/systemd/system.conf"
To cooperate with the jenkins-dynamatrix driving NUT CI builds, each build environment should be exposed as an individual agent with labels describing its capabilities.
Emulated-CPU container builds are CPU-intensive, so for them we define as few capabilities as possible: here CI is more interested in checking how binaries behave on those CPUs, not in checking the quality of recipes (distcheck, Make implementations, etc.), shell scripts or documentation, which is more efficient to test on native platforms.
Still, we are interested in results from different compiler suites, so specify at least one version of each.
Currently the NUT Jenkinsfile-dynamatrix only looks at various
COMPILER
variants for this use-case, disregarding the versions and
just using one that the environment defaults to.
The reduced set of labels for QEMU workers looks like:
qemu-nut-builder qemu-nut-builder:alldrv NUT_BUILD_CAPS=drivers:all NUT_BUILD_CAPS=cppunit OS_FAMILY=linux OS_DISTRO=debian11 GCCVER=10 CLANGVER=11 COMPILER=GCC COMPILER=CLANG ARCH64=ppc64le ARCH_BITS=64
For contrast, a "real" build agent’s set of labels, depending on presence or known lack of some capabilities, looks like:
doc-builder nut-builder nut-builder:alldrv NUT_BUILD_CAPS=docs:man NUT_BUILD_CAPS=docs:all NUT_BUILD_CAPS=drivers:all NUT_BUILD_CAPS=cppunit=no OS_FAMILY=bsd OS_DISTRO=freebsd12 GCCVER=10 CLANGVER=10 COMPILER=GCC COMPILER=CLANG ARCH64=amd64 ARCH_BITS=64 SHELL_PROGS=sh SHELL_PROGS=dash SHELL_PROGS=zsh SHELL_PROGS=bash SHELL_PROGS=csh SHELL_PROGS=tcsh SHELL_PROGS=busybox MAKE=make MAKE=gmake PYTHON=python2.7 PYTHON=python3.8
ci-debian-altroot--jenkins-debian10-arm64
(note the
pattern for "Conflicts With" detailed below)
/home/abuild/jenkins-nut-altroots/jenkins-debian10-armel
Node properties / Environment variables:
PATH+LOCAL
⇒ /usr/lib/ccache
Depending on circumstances of the container, there are several options available to the NUT CI farm:
agent.jar
would run in the container.
Filesystem for the abuild
account may be or not be shared with the host.
ssh
or chroot
(networking not required, but bind-mount
of /home/abuild
and maybe other paths from host would be needed) called
for executing sh
steps in the container environment. Either way, home
directory of the abuild
account is maintained on the host and shared with
the guest environment, user and group IDs should match.
Methods below involving SSH assume that you have configured a password-less
key authentication from the host machine to the abuild
account in container.
This can be an ssh-keygen
result posted into authorized_keys
, or a trusted
key passed by a chain of ssh agents from a Jenkins credential for connection
to the container-hoster into the container.
For passing the agent through an SSH connection from host to container,
so that the agent.jar
runs inside the container environment, configure:
Prefix Start Agent Command: content depends on the container name,
but generally looks like the example below to report some info about
the final target platform (and make sure java
is usable) in the
agent’s log. Note that it ends with un-closed quote and a space char:
ssh jenkins-debian10-amd64 '( java -version & uname -a ; getconf LONG_BIT; getconf WORD_BIT; wait ) &&
'
The other option is to run the agent.jar
on the host, for all the
network and filesystem magic the agent does, and only execute shell
steps in the container. The solution relies on overridden sh
step
implementation in the jenkins-dynamatrix shared library that uses a
magic CI_WRAP_SH
environment variable to execute a pipe into the
container. Such pipes can be ssh
or chroot
with appropriate host
setup described above.
In case of ssh piping, remember that the container’s
/etc/ssh/sshd_config
should AcceptEnv *
and the SSH
server should be restarted after such change.
Prefix Start Agent Command: content depends on the container name, but generally looks like the example below to report some info about the final target platform (and make sure it is accessible) in the agent’s log. Note that it ends with a space char, and that the command here should not normally print anything into stderr/stdout (this tends to confuse the Jenkins Remoting protocol):
echo PING > /dev/tcp/jenkins-debian11-ppc64el/22 &&
Node properties / Environment variables:
CI_WRAP_SH
⇒ `ssh -o SendEnv=* "jenkins-debian11-ppc64el" /bin/sh -xe `
Another aspect of farm management is that emulation is a slow and intensive operation, so we can not run all agents and execute builds at the same time.
The current solution relies on https://github.com/jenkinsci/jenkins/pull/5764 to allow build agents to conflict with each other — one picks up a job from the queue and blocks others from starting; when it is done, another may start.
Containers can be configured with "Availability ⇒ On demand", with shorter cycle to switch over faster (the core code sleeps a minute between attempts):
0
;
0
(Jenkins may change it to 1
);
^ci-debian-altroot--.*$
assuming that is the pattern
for agent definitions in Jenkins — not necessarily linked to hostnames.
Also, the "executors" count should be reduced to the amount of compilers in that system (usually 2) and so avoid extra stress of scheduling too many emulated-CPU builds at once.