Originally posted on screwdriver.cd
Screwdriver is a scalable CI/CD solution which uses Kubernetes to manage user builds. Screwdriver build workers interfaces with Kubernetes using either “executor-k8s” or “executor-k8s-vm” depending on required build isolation.
executor-k8s runs builds directly as Kubernetes pods while executor-k8s-vm uses HyperContainers along with Kubernetes for stricter build isolation with containerized Virtual Machines (VMs). This setup was ideal for running builds in an isolated, ephemeral, and lightweight environment. However, HyperContainer is now deprecated, has no support, is based on an older Docker runtime and it also required non-native Kubernetes setup for build execution. Therefore, it was time to find a new solution.
Why Kata Containers?
Kata Containers is an open-source project and community that builds a standard implementation of lightweight virtual machines (VMs) that perform like containers, but provide the workload isolation and security advantages of VMs. It combines the benefits of using a hypervisor, such as enhanced security, along with container orchestration capabilities provided by Kubernetes. It is the same team behind HyperD where they successfully merged the best parts of Intel Clear Containers with Hyper.sh RunV. As a Kubernetes runtime, Kata enables us to deprecate executor-k8s-vm and use executor-k8s exclusively for all Kubernetes based builds.
Screwdriver’s journey to Kata
As we faced a growing number of instabilities with the current HyperD – like network and devicemapper issues and IP cleanup workarounds, we started our initial evaluation of Kata in early 2019 (
https://github.com/screwdriver-cd/screwdriver/issues/818#issuecomment-482239236) and identified two major blockers to move ahead with Kata:
1. Security concern for privileged mode (required to run docker daemon in kata)
2. Disk performance.
We recently started reevaluating Kata in early 2020 based on a fix to “add flag to overload default privileged host device behaviour” provided by Containerd/cri (https://github.com/containerd/cri/pull/1225), but still we faced issues with disk performance and switched from overlayfs to devicemapper, which yielded significant improvement. With our two major blockers resolved and initial tests with Kata looking promising, we moved ahead with Kata.
Screwdriver Build Architecture
Replacing Hyper with Kata led to a simpler build architecture. We were able to remove the custom build setup scripts to launch Hyper VM and rely on native Kubernetes setup.
To use Kata containers for running user builds in a Screwdriver Kubernetes build cluster, a cluster admin needs to configure Kubernetes to use Containerd container runtime with Cri-plugin.
Screwdriver build Kubernetes cluster (minimum version: 1.14+) nodes must have the following components set up for using Kata containers for user builds.
Containerd is a container runtime that helps with management of the complete lifecycle of the container.
Cri-Containerd is a containerd plugin which implements Kubernetes container runtime interface. CRI plugin interacts with containerd to manage the containers.
Reference: https://github.com/containerd/cri/blob/master/docs/installation.md, https://github.com/containerd/containerd/blob/master/docs/ops.md
To debug, inspect, and manage their pods, containers, and container images.
Builds lightweight virtual machines that seamlessly plugin to the containers ecosystem.
Routing builds to Kata in Screwdriver build cluster
Screwdriver uses Runtime Class to route builds to Kata nodes in Screwdriver build clusters. The Screwdriver plugin executor-k8s config handles this based on:
1. Pod configuration:
apiVersion: v1 kind: Pod metadata: name: kata-pod namespace: sd-build-namespace labels: sdbuild: “sd-kata-build” app: screwdriver tier: builds spec: runtimeClassName: kata containers: - name: “sd-build-container” image: <<image>> imagePullPolicy: IfNotPresent
2. Update the plugin to use k8s in your buildcluster-queue-worker configuration
The below tables compare build setup and overall execution time for Kata and Hyper when the image is pre-cached or not cached.
While the new Kata implementation offers a lot of advantages, there are some known problems we are aware of with fixes or workarounds:
- Run images based on Rhel6 containers don’t start and immediately exit
- Pre-2.15 glibc: Enabled kernel_params = “vsyscall=emulate” refer to the kata issue if trouble running pre-2.15 glibc.
- Yum install will hang forever: Enabled kernel_params = “init=/usr/bin/kata-agent” refer kata issue to get a better boot time, small footprint.
- 32-bit executable cannot be loaded refer kata issue: To workaround/mitigate we maintain a container exclusion list and route to current hyperd setup and we have plans to eol these containers by Q4 of this year.
- Containerd IO snapshotter – Overlayfs vs devicemapper for storage driver: Devicemapper gives better performance. Overlayfs took 19.325605 seconds to write 1GB, but Devicemapper only took 5.860671 seconds.
In order to use this feature, you will need these minimum versions:
- API – v0.5.902
- UI – v1.0.515
- Build Cluster Queue Worker – v1.18.0
- Launcher – v6.0.71
Thanks to the following contributors for making this feature possible:
- Lakshminarasimhan Parthasarathy
- Suresh Visvanathan
- Nandhakumar Venkatachalam
- Pritam Paul
- Chester Yuan
- Min Zhang
Questions & Suggestions
We’d love to hear from you. If you have any questions, please feel free to reach out here. You can also visit us on GitHub and Slack.