Written by Lakshminarasimhan Parthasarathy
Introduction
Screwdriver is a scalable CI/CD solution which uses Kubernetes to manage user builds. Screwdriver build workers interfaces with Kubernetes using either “executor-k8s” or “executor-k8s-vm” depending on required build isolation.
executor-k8s runs builds directly as Kubernetes pods while executor-k8s-vm uses HyperContainers along with Kubernetes for stricter build isolation with containerized Virtual Machines (VMs). This setup was ideal for running builds in an isolated, ephemeral, and lightweight environment. However, hyperd is now deprecated, has no support, is based on an older Docker runtime and it also required non-native Kubernetes setup for build execution. Therefore, it was time to find a new solution.
Why Kata Containers ?
Kata Containers is an open source project and community that builds a standard implementation of lightweight virtual machines (VMs) that perform like containers, but provide the workload isolation and security advantages of VMs. It combines the benefits of using a hypervisor, such as enhanced security, along with container orchestration capabilities provided by Kubernetes. It is the same team behind HyperD where they successfully merged the best parts of Intel Clear Containers with Hyper.sh RunV. As a Kubernetes runtime, Kata enables us to deprecate executor-k8s-vm and use executor-k8s exclusively for all Kubernetes based builds.
Screwdriver Journey to Kata
As we faced a growing number of instabilities with the current HyperD – like network and devicemapper issues and IP cleanup workarounds, we started our initial evaluation of Kata in early 2019 (https://github.com/screwdriver-cd/screwdriver/issues/818#issuecomment-482239236) and identified two major blockers to move ahead with Kata:
1. Security concern for privileged mode (required to run docker daemon in kata)
2. Disk performance.
We recently started reevaluating Kata in early 2020 based on a fix to “add flag to overload default privileged host device behaviour” provided by Containerd/cri (https://github.com/containerd/cri/pull/1225), but still we faced issues with disk performance and switched from overlayfs to devicemapper, which yielded significant improvement. With our two major blockers resolved and initial tests with Kata looking promising, we moved ahead with Kata.
Screwdriver Build Architecture
Replacing Hyper with Kata led to a simpler build architecture. We were able to remove the custom build setup scripts to launch Hyper VM and rely on native Kubernetes setup.
Setup
To use Kata containers for running user builds in a Screwdriver Kubernetes build cluster, a cluster admin needs to configure Kubernetes to use Containerd container runtime with Cri-plugin.
Components
Screwdriver build Kubernetes cluster (minimum version: 1.14+) nodes must have the following components set up for using Kata containers for user builds.
Containerd:
Containerd is a container runtime that helps with management of the complete lifecycle of the container.
Reference: https://containerd.io/docs/getting-started/
CRI-Containerd plugin:
Cri-Containerd is a containerd plugin which implements Kubernetes container runtime interface. CRI plugin interacts with containerd to manage the containers.
Reference: https://github.com/containerd/cri
Image credit: containerd / cri. Photo licensed under CC-BY-4.0.
Architecture:
Image credit: containerd / cri. Photo licensed under CC-BY-4.0
Installation:
Reference:
https://github.com/containerd/cri/blob/master/docs/installation.md
https://github.com/containerd/containerd/blob/master/docs/ops.md
Tarball: https://storage.googleapis.com/cri-containerd-release/cri-containerd-1.3.3.linux-amd64.tar.gz
Crictl:
To debug, inspect, and manage their pods, containers, and container images.
Reference: https://github.com/containerd/cri/blob/master/docs/crictl.md
Kata:
Builds lightweight virtual machines that seamlessly plugin to the containers ecosystem.
Architecture:
Image credit: kata-containers Project licensed under Apache License Version 2.0
Installation:
- https://github.com/kata-containers/documentation/blob/master/Developer-Guide.md#run-kata-containers-with-kubernetes
- https://github.com/kata-containers/documentation/blob/master/how-to/containerd-kata.md
- https://github.com/kata-containers/documentation/blob/master/how-to/how-to-use-k8s-with-cri-containerd-and-kata.md
- https://github.com/kata-containers/documentation/blob/master/how-to/containerd-kata.md#kubernetes-runtimeclass
- https://github.com/kata-containers/documentation/blob/master/how-to/containerd-kata.md#configuration
Routing builds to Kata nodes in Screwdriver build cluster
Screwdriver uses Runtime Class to route builds to Kata nodes in Screwdriver build clusters. The Screwdriver plugin executor-k8s config handles this based on:
- Pod configuration:
apiVersion: v1
kind: Pod
metadata:
name: kata-pod
namespace: sd-build-namespace
labels:
sdbuild: "sd-kata-build"
app: screwdriver
tier: builds
spec:
runtimeClassName: kata
containers:
- name: "sd-build-container"
image: <<image>>
imagePullPolicy: IfNotPresent
- Update the plugin to use k8s in your buildcluster-queue-worker configuration
---
executor:
# Default executor
plugin: k8s
k8s:
exclusion:
- 'rhel6'
weightage: 0
options:
kubernetes:
# The host or IP of the kubernetes cluster
host: kubernetes.default
# Privileged mode, default restricted, set to true for trusted container runtime use-case
privileged: false
automountServiceAccountToken: false
dockerFeatureEnabled: false
resources:
cpu:
# Number of cpu cores
micro: "0.5"
low: 2
high: 6
turbo: 12
memory:
# Memory in GB
micro: 1
low: 2
high: 12
turbo: 16
# Default build timeout for all builds in this cluster
buildTimeout: 90
# Default max build timeout
maxBuildTimeout: 120
# k8s node selectors for approprate pod scheduling
nodeSelectors: {"dedicated":"screwdriver-kata"}
preferredNodeSelectors: {}
annotations: {}
# support for kata-containers-as-a-runtimeclass
runtimeClass: "kata"
# Launcher image to use
launchImage: screwdrivercd/launcher
# Container tags to use
launchVersion: stable
# Circuit breaker config
fusebox:
breaker:
# in milliseconds
timeout: 10000
# requestretry configs
requestretry:
# in milliseconds
retryDelay: 3000
maxAttempts: 5
Production rollout
- Test out the new setup with pilot users
- Route a percentage of traffic to Kata nodes using the weightage configuration
- Based on the limitation “Kata default guest kernel does not support IA32 bit binaries”, maintain a list of containers to exclude; only route builds to nodes with Kata when the container is not in the list
Performance
The below tables compare build setup and overall execution time for Kata and Hyper when the image is pre-cached or not cached.
| Image: node12with Image cached in node | Kata (with 1 min wait in build) | Hyper (with 1 min wait in build) |
| Setup time | 28 secs | 50 secs |
| Overall execution time | 1 min 32 secs | 1 min 56 secs |
| Image: node12without Image cached in node | Kata (with 1 min wait in build) | HyperD (with 1 min wait in build) |
| Setup time | 51 secs | 1 min 32 secs |
| Overall time | 1 min 55 secs | 2 min 40 secs |
Known problems
While the new Kata implementation offers a lot of advantages, there are some known problems we are aware of with fixes or workarounds:
- Run images based on Rhel6 containers don’t start and immediately exit
- Enabled kernel_params = “vsyscall=emulate” refer kata issue https://github.com/kata-containers/runtime/issues/1916 if trouble running pre-2.15 glibc.
- Yum install will hang forever
- Enabled kernel_params = “init=/usr/bin/kata-agent” refer kata issue https://github.com/kata-containers/runtime/issues/1916 to get a better boot time, small footprint .
| Before fix | After fix |
| sh-4.1# time yum remove wget -yreal 6m22.190suser 2m38.387ssys 3m38.619s sh-4.1# time yum install wget -yreal 6m23.407suser 2m39.387ssys 3m42.606s | sh-4.1# time yum remove wget -yreal 0m4.774suser 0m0.783ssys 0m0.123s sh-4.1# time yum install wget -yreal 0m2.169suser 0m1.760ssys 0m0.298s |
- 32-bit executable cannot be loaded refer kata issue https://github.com/kata-containers/runtime/issues/886
- To workaround/mitigate we maintain a container exclusion list and route to current hyperd setup and we have plans to eol these containers by Q4 of this year.
- Containerd IO snapshotter – Overlayfs vs devicemapper for storage driver
- Devicemapper gives better performance with kata
| Overlayfs | Devicemapper |
| 1024000000 bytes (976.6MB) copied, 19.325605 seconds, 50.5MB/s | 1024000000 bytes (976.6MB) copied, 5.860671 seconds, 166.6MB/s |
- Image stored in both sys-root and devicemapper volume, consuming both volume disk space
Compatibility List
In order to use this feature, you will need these minimum versions:
- API – v0.5.902
- UI – v1.0.515
- Build Cluster Queue Worker – v1.18.0
- Launcher – v6.0.71
Contributors
Thanks to the following contributors for making this feature possible:
- Lakshminarasimhan Parthasarathy
- Suresh Visvanathan
- Pritam Paul
- Chester Yuan
- Nandhakumar Venkatachalam
- Min Zhang
Questions & Suggestions
We’d love to hear from you. If you have any questions, please feel free to reach out here. You can also visit us on Github and Slack.
CDF Newsletter – May 2020 Article
Subscribe to the Newsletter
By Rosalind Benoit
Don’t worry. As long as you hit that wire with the connecting hook at precisely eighty-eight miles per hour the instant the lightning strikes the tower…everything will be fine.
– Dr. Emmett Brown, “Back To The Future”
If you’re reading this, you’ve probably experienced the feeling of your heart racing — hopefully with excitement, but more likely, with anxiety — as a result of your involvement in the software development lifecycle (SDLC). At most organizations, artifacts must traverse a complex network of teams, tools, and constraints to come into being and arrive in production. As software becomes more and more vital to social connection and economic achievement, we feel the pressure to deliver transformational user experiences.
No company has influenced human expectations for reliably delightful software experiences more than Netflix. After 10 years of supporting large-scale logistics workloads with its mail-order business, Netflix launched an addictive streaming service in 2007. It soon experienced SDLC transformation at an uncommonly rapid pace, and at massive scale. After pioneering a new entertainment standard, Netflix survived and innovated through all the learnings that come with growth.
We’ll soon have one more reason to be glad it did; Back to the Future arrives on Netflix May 1!
https://www.youtube.com/watch?v=KqYvQchlriY
Jenkins at Netflix
You may know Netflix as the birthplace of open source Spinnaker, but it is also a perennial Jenkins user. As early cloud adopters, Netflix teams quickly learned to automate build and test processes, and heavily leveraged Jenkins, evolving from “a single massive Jenkins master in our datacenter, to running 25 Jenkins masters in AWS” as of 2016.
Jenkins changed the software development and delivery game by freeing teams from rigid, inflexible build processes and moving them into continuous integration. With test and build automation, “it works on my laptop” became a moot point. A critical leap for software-centric businesses like Netflix, this ignited a spark of the possible.
As Jenkins became an open source standard, engineers leveraged it to prove the power of software innovation, and the difference that velocity makes to improving user experiences and business outcomes. This approachable automation still works, and most of us still use it, over 15 years after its first release.
Over time, Netflix teams found it increasingly difficult to meet velocity, performance, and reliability demands when deploying their code to AWS with Jenkins alone. Too much technical debt had accumulated in their Jenkins and its scripts, and developers, feeling the anxiety, craved more deployment automation features. So, Netflix began to build the tooling that evolved into today’s Spinnaker.
Spinnaker & Delegation
Much like what Jenkins did for testing and integration, Spinnaker has done for release automation. It allows us to stitch together the steps required to safely deliver updates and features to production; it delegates pipeline stages to systems across the toolchain, from build and test, to monitoring, compliance, and more. Spinnaker increasingly uses its plugin framework to integrate tools. However, its foundational Jenkins integration exists natively, using triggers to pick up artifacts from it, and stages to delegate tasks to it. With property files to pass data for use in variables further down the pipeline, and concepts like Jenkins’ “unstable build” built in, Spinnaker can leverage the power of existing Jenkins assets.
Then, out of the box, Spinnaker adds the “secret sauce” pioneered by companies like Netflix to deliver the software experiences users now expect. With Spinnaker, you can skip change approval meetings by adding manual judgments to pipelines where human decisions are required. You can perform hotfixes with confidence and limit the blast radius of experiments by using automated canary analysis or your choice of deployment strategy. Enjoy these features when deploying code or functions to any cloud and/or Kubernetes, without maintaining custom scripts to architect pipelines.
As a developer, I found that I had the best experience using Jenkins for less complicated jobs and pipelines; even with much of the process defined as code, I didn’t always have enough context to fully understand the progression of the artifact or debug. Since joining the Spinnaker community, I’ve learned to rely on Jenkins stages for discrete steps like applying a Chef cookbook or signalling a Puppet run. I can manage these steps from Spinnaker, where, along with deployment strategies and native infrastructure dashboards, I can also experiment with data visualization using tools like SumoLogic, and even run terraform code.
It’s simple to get started with the integration. I use Spinnaker’s Halyard tool to add my Jenkins master, and boom:
If Jenkins is a Swiss Army knife, Spinnaker is a magnetic knife strip. Their interoperability story is the story of continuous delivery’s evolution, and allows us to use the right tool for the right job:
- Jenkins: not only do I have all the logic and capability needed to perform your testing, integration, and deployment steps, I’m also an incredibly flexible tool with a plugin for every special need of every development team under the sun. I’m game for any job!
- Spinnaker: not only can I give your Jenkins jobs a context-rich home, I also delegate to all your other SDLC tools, and visualize the status and output of each. My fancy automation around deployment verifications, windows, and strategies makes developers happy and productive!
My first real experience with DevOps was a Jenkins talk delivered by Tracy Ragan at a conference in Albuquerque, where I worked as an (anxious) sysadmin for learning management systems at UNM. It’s amazing to have come full circle and joined the CDF landscape as a peer from a fellow member company. I look forward to aiding the interoperability story as it unfolds in our open source ecosystem. We’re confident the tale will transform software delivery, yet again.
Join Spinnaker Slack to connect with other DevOps professionals using Jenkins and Spinnaker to deliver software with safety and velocity!

