Skip to main content


A case for declarative configurations for ML training

By Blog, Community

Contributed by Benedikt Koller

Original article posted on May 17, 2020

No way around it: I am what you call an “Ops guy”. In my career I admin’ed more servers than I’ve written code. Over twelve years in the industry have left their permanent mark on me. For the last two of those I’m exposed to a new beast – Machine Learning. My hustle is bringing Ops-Knowledge to ML. These are my thoughts on that.

Deploying software into production

Hundreds of thousands of companies deploy software into production every day. Every deployment mechanism has someone who built it. Whoever it was (The Ops Guy™, SRE-Teams, “Devops Engineers”), all follow tried-and-true paradigms. After all, the goal is to ship code often, in repeatable and reliable ways. Let me give you a quick primer on two of those.

Infrastructure-as-code (IaC)

Infrastructure as code, or IaC, applies software engineering rules to infrastructure management. The goal is to avoid environment drift, and to ensure idempotent operations. In plain words, read the infrastructure configuration and you’ll know exactly how the resulting environment looks like. You can rerun the provisioning without side effects, and your infrastructure has a predictable state. IaC allows for version-controlled evolution of infrastructures and quick provisioning of extra resources. It does so through declarative configurations.

Famous tools for this paradigm are Terraform, and to a large degree Kubernetes itself.

Immutable infrastructure

In conjunction with IaC, immutable infrastructure ensures the provisioned state is maintained. Someone ssh’ed onto your server? Its tainted – you have no guarantee that it still is in the identical shape to the rest of your stack. Interaction between a provisioned infrastructure and new code happens only through automation. Infrastructure, e.g. a Kubernetes cluster, is never modified after it’s provisioned. Updates, fixes and modifications are only possible through new deployments of your infrastructure.

Operational efficiency requires thorough automation and handling of ephemeral data. Immutable infrastructure mitigates config drift and snowflake server woes entirely.

ML development

Developing machine learning models works in different ways. In a worst case scenario, new models begin their “life” in a Jupyter Notebook on someones laptop. Code is not checked into git, there is no requirements file, and cells can be executed in any arbitrary order. Data exploration and preprocessing are intermingled. Training happens on that one shared VM with the NVIDIA K80, but someone messed with the CUDA drivers. Ah, and does anyone remember where I put those matplotlib-screenshots that showed the AUROC and MSE?

Getting ML models into production reliably, repeatedly and fast remains a challenge, and large data sets become a multiplying factor. The solution? Learn from our Ops-brethren.

We can extract key learnings from the evolution of infrastructure management and software deployments:

  1. Automate processing and provisioning
  2. Version-control states and instructions
  3. Write declarative configs

How can we apply them to a ML training flow?

Fetching data

Automate fetching of data. Declaratively define the datasource, the subset of data to use and then persist the results. Repeated experiments on the same source and subset can use the cached results.

Thanks to automation, fetching data can be rerun at any time. The results are persisted, so data can be versioned. And by reading the input configuration everyone can clearly tell what went into the experiment.

Splitting (and preprocessing data)

Splitting data can be standardized into functions.

  • Splitting happens on a quota, e.g. 70% into train, 30% into eval. Data might be sorted on an index, data might be categorized.
  • Splitting happens based on features/colums. Data might be categorized, Data might be sorted on an index.
  • Data might require preprocessing / feature engineering (e.g. filling, standardization).
  • A wild mix of the above.

Given those, we can define an interface and invoke processing through parameters – and use a declarative config. Persist the results so future experiments can warm-start.

Implementation of interfaces makes automated processing possible. The resulting train/eval datasets are versionable, and my input config is the declarative authority on the resulting state of the input dataset.



Standardizing models is hard. Higher-level abstractions like Tensorflow and Keras already provide comprehensive APIs, but complex architectures need custom code injection.

A declarative config will, at least, state which version-controlled code was used. Re-runs on the same input will deliver the same results, re-runs on different inputs can be compared. Automation of training will yield a version-controllable artefact – the model – of a declared and therefore anticipatable shape.


Surprisingly, this is the hardest to fully automate. The dataset and individual usecase define the required evaluation metrics. However, we can stand on the shoulders of giants. Great tools like Tensorboard and the What-If-Tool go a long way. Our automation just needs to account for enough flexibility that a.) custom metrics for evaluation can be injected, and b.) raw training results are exposed for custom evaluation means.


Serving is caught between the worlds. It would be easy to claim that a trained model is a permanent artifact, like you might claim that a Docker container acts as an artifact of software development. We can borrow another learning from software developers – if you don’t understand where your code is run, you don’t understand your code.

Only by understanding how a model is served will a ML training flow ever be complete. For one, data is prone to change. A myriad of reasons might be the cause, but the result remains the same: Models need to be retrained to account for data drift. In short, continuous training is required. Through the declarative configuration of our ML flow so far we can reuse this configuration and inject new data – and iterate on those new results.

For another, preprocessing might need embedding with your model. Automation lets us apply the same preprocessing steps used in training to live data, guaranteeing identical shape of input data.


Outside academia, performance of machine learning models is measured through impact – economically, or by increased efficiency. Only reliable and consistent results are true measures for the success of applied ML. We as a new and still growing part of software engineering have to make sure of this success. And the reproducibility of success hinges on the repeatability of the full ML development lifecycle.

Introducing Our Newest CDF Ambassador – Shlomo Bielak

By Blog, Staff

Shlomo Bielak here. I am a CTO at a boutique SI in Canada trying to help anyone and everyone understand how to implement Continuous Deployment.

I think we have got down Continuous Delivery and Integration. Not-so-much the auto-deploy to production; that **** is complicated. I enjoy sharing on our webinars or with customers our Star-Trek approach to governance and CI/CD pipeline fitness.

Demo…Demo…Demo = Believe. I am all about sharing the practicing side of DevSecOps within CI/CD. Having invented governance engineering which is the operating model for security within CI/CD we know the complexities of making security fast within a DevOps mode. I share the model, process, milestones, and clear tasks to achieve those milestones, using an inner-source training plan to simplify your CI/CD journey.

I love publishing whitepapers and archetypes/exemplars to the market to better understand CI/CD which is their thought-leadership’s starting point – knowing your goal.I have been the keynote at CDF sponsor companies where they are looking to understand what the enterprise requires to be successful. I make sure my content hits the pain points and some self-deprecating laugh-points.

Today my desk is set for COVID. Tomorrow it is ready for conferences. Happy to be here supporting the CDF. Its business goal is at the core of our practice.

Introducing Our Newest CDF Ambassador – Tiffany Jachja

By Blog, Staff

Hi Readers,

2020 has been a crazy year, yet the opportunities remain to connect, learn, and share throughout our communities, and so I’m thrilled to join the Continuous Delivery Foundation. As a newly minted member of the CDF Ambassadors program, I look forward to getting to know everyone. 

A little bit more about me: my name is Tiffany Jachja. I’ve lived in Maryland almost all my life (go Old Bay!). One of my goals is to become a catalyst for better software delivery. 

Me, the one time I decided to leave Maryland and live 2,000 miles away from home. 

I work as an evangelist at Harness. This is my team.

We believe in empowering developers to move fast without breaking things.

I joined at the start of 2020, excited to travel, connect, and share my experiences around software delivery. 

Of course with the shelter in place policies, the travel bit did not pan out. But I’m grateful and fortunate for the opportunities to contribute to digitally! 

Observe2020 was a day-long conference held in April about Observability. 

ONUG Digital Live was ONUG’s first virtual event held in May 2020. 

I’ve been enjoying the fact that many industry events and sessions are now free to attend. It gives people who normally would not be able to attend an event, the opportunity to grow new skills and learn more about specific topics.

As you can tell, I do enjoy being on stage.I look forward to a healthier and safer time. 

I’m grateful for all the had opportunities I’ve had to help organizations and teams accelerate their DevOps journeys. It’s very rewarding to be a part of a team that’s hit their stride and can deliver effectively.

Before joining Harness, I was a consultant at Red Hat. I focused on cloud-native application development, so helping enterprises adopt and work with applications living in the cloud. I spent the latter half of my time at Red Hat, focusing on DevOps practices and culture. 

It’s important to work with your people, processes, and technology properly when going on transformation journeys.

An area we can improve on within the tech space is sharing stories and leveraging the experiences of others. 

I believe becoming a CDF Ambassador gives me the opportunities to help drive that mission further. 

Stay passionate, caring, and safe during these times. 



From Jenkins – Join us for online UI/UX hackfest on May 25-29!

By Blog, Staff

Originally published by Oleg Nenashev on the Jenkins blog.

On behalf of the Jenkins User ExperienceDocumentation and Advocacy and Outreach special interest groups, we are happy to announce the online UI/UX hackfest on May 25-29! Everyone is welcome to participate, regardless of their Jenkins development experience.

The goal is to get together and work on improving Jenkins user experience, including but not limited to user interface and user documentation. We also invite you to share experiences about Jenkins and to participate in UX testing. The event follows the Jenkins is the Way theme and the most active contributors will get special edition swag and prizes!

register button

Event plan

This hackfest is NOT a hackathon. We do not expect participants to dedicate all their time during the event timeframe, but hop-in/hop-out as their time allows. Everybody can spend as much time as they are willing to dedicate. Spending a few days or just a few hours is fine, any contributions matter regardless of their size. Jenkins development experience is not required, we have newcomer-friendly stories for those who want to start contributing to the project. We will also have a 24/7 jenkinsci/hackfest Gitter chat for Q&A and coordination between contributors.

There will be 3 main tracks:

  • User Interface – Improve look&feel and accessibility for Jenkins users, work on new read-only interface for instances managed with configuration as code, create and update Jenkins themes, and many other topics. This track is coordinated by the UX SIG.
  • User Documentation – Improve and create new user documentationtutorials and solution pages. Also, there is ongoing documentation migration from Wiki to and plugin repositories. This track is coordinated by the Documentation SIG.
  • Spread the word – Write user stories for Jenkins Is The Way site and the Jenkins blog, post about your Jenkins user experience and new features, record overview and HOWTO videos, etc. This track is coordinated by the Advocacy and Outreach SIG.

We are working on publishing project ideas and issues for the listed tracks. The current list can be found on the UI / UX hackfest event page, this list will be finalized by the beginning of the hackfest. You are welcome to propose your own projects within the User Experience theme.

During the event, we will organize online meetups and ad-hoc training sessions in different timezones. All these sessions will be recorded and shared on our YouTube channel. There are no mandatory sessions you must attend, you are welcome to join ones remotely or watch the recordings. After the event we will invite participants to demo their projects at online meetings or recorded sessions.


register button

P.S: Note that the registration form has a question top 3 things we could change in Jenkins to improve your user experience. We would appreciate your response there!


Please use the following contacts to contact organizers:


Swag and Prizes

Thanks to our sponsors (CloudBees, Inc. and Continuous Delivery Foundation), we are happy to offer swag to active contributors!

  • 50 most-active contributors will get an exclusive “Jenkins Is The Way” T-shirt and stickers
  • Active contributors will get Jenkins stickers and socks
  • We are working on special prizes for top contributors, to be announced later
Jenkins Is The Way T-shirt
Jenkins Stickers


We thank all contributors who participate in this event as committers! We especially thank all reviewers, organizers and those who participated in the initial program reviews and provided invaluable feedback. In particular, we thank User ExperienceDocumentation and Advocacy and Outreach SIG members who heavily contributed to this event.

We also thank sponsors of the event who make the swag and prizes possible: CloudBees, Inc. and Continuous Delivery Foundation (CDF). In addition to swag, CloudBees donates working time for event hosts and reviewers. CDF also sponsors our online meetup platform which we will be using for the event.

9 CD Foundation Projects Are Participating in this Year’s Google Summer of Code

By Blog, Staff

The CD Foundation has joined the list of organizations participating in Google’s Summer of Code (GSoc) this year. GSoC is an annual program aimed at bringing more student developers into open source software development. The CD Foundation projects Spinnaker and Screwdriver joined long-time participant Jenkins in providing mentors for a number of projects for students interested in continuous delivery and software pipeline infrastructure.

In total, 7 Jenkins projects, 2 Spinnaker and 1 Screwdriver project were accepted in this summer’s program. Mentors from many different organizations around the world are pitching in, including CD Foundation ambassadors.

“The CD Foundation is dedicated to supporting open source continuous delivery projects worldwide. Part of that mission includes supporting and encouraging the next generation of talented developers worldwide, said Tara Hernandez, Senior Engineering Manager, Google Cloud Platform and CD Foundation Technical Oversight Committee member. “Thank you to the students and mentors who work tirelessly to create and innovate for the GSoC. We hope everyone has a fantastic time coding and learning this summer. Congratulations!”

The following is a list of the projects accepted and links to each project description and associated mentors.

Jenkin’s Projects 

Loghi Perinpanayagam – Jenkins Machine Learning Plugin for Data Science

This project provides a plugin for data scientists to integrate Machine Learning Workflow with Jenkins.

Kezhi Xiong – GitHub Checks API for Jenkins Plugins

The GitHub Checks API allows developers to report the CI integrations’ detail information rather than binary pass/fail build status on GitHub pages.

stellargo – External Fingerprint Storage for Jenkins

File fingerprinting is a way to track which version of a file is being used by a job/build, making dependency tracking easy.

Rishabh Budhouliya – Git Plugin Performance Improvement

The principles of micro-benchmarking were used to create and execute a test suite which involves comparison of GitClient APIs implemented by CliGitAPIImpl and JGitAPIImpl using “average execution time per operation” as a performance metric.

Buddhika Chathuranga – Jenkins Windows Services: YAML Configuration Support

Enhance Jenkins master and agent service management on Windows by offering new configuration file formats and improving settings validation.

Zixuan Liu – Jenkins X: Consolidate the use of Apps / Addons

The main aim of the project is to consolidate Apps and Addons inside Jenkins X to avoid confusion.

Sladyn Nunes Custom Jenkins Distribution Build Service

The main idea behind the project is to build a customizable Jenkins distribution service that could be used to build tailor-made Jenkins distributions.

Spinnaker Projects

Victor Odusanya – Drone CI type for Spinnaker pipeline stage

Add Drone build type as a Spinnaker pipeline stage type.

Moki Daniel – “Continuous Delivery, Continuous Deployments with Spinnaker” 

This project idea will aim at ensuring continuous delivery and continuous deployments, bringing up automated releases, undertaking deployments across multiple cloud providers, and mastering the best built-in deployments practices from Spinnaker.

Screwdriver Project

Supratik Das – Improve SCM Integration

The two key areas where Screwdriver will be improved are introduction of deployment keys for seamless handling of private repositories and triggering of builds from external SCM repositories.

Thank you to all participants! We look forward to getting updates and information on progress over the summer. For more details, please continue to visit the CD Foundation blog.

From Red Hat – Part 1: Building Multiarch imageStream with the NFD Operator and OpenShift 4

By Blog, Member

Originally posted on the OpenShift 4.3 blog by Eduardo Arango


Using general available packages (in the form of container images) from an official source or a certified provider comes with a big caveat in relation to performance-sensitive workloads. 

These packages may provide ABI compatibility, but they are not optimized for our specialized hardware (like GPUs or high-performance NICs), nor our CPU chip architecture. The best way to address this is to compile your packages (build your images) on your own deployment.

OpenShift provides a way to seamlessly build images based on defined events called BUILDS. A build is the process of transforming input parameters into a resulting object. Most often, the process is used to transform input parameters or source code into a runnable image. A BuildConfig object is the definition of the entire build process.

The missing part to building hardware-specific images is to orchestrate the build process over the different available resources. In this post you will learn about the Node Feature Discovery (NFD) operator and how to tie it to OpenShift builds to have a hardware-specific image build.

The first part describes the NFD operator and how you can use it to manage the detection of hardware features in the cluster. The second part describes how to create an imageStream from a GitHub webhook and how to use the information from the NFD operator to schedule node-specific builds. The third part presents a sample app to test what you have learned.

The Node Feature Discovery Operator

The Node Feature Discovery Operator (NFD) manages the detection of hardware features and configuration in an OpenShift cluster by labeling the nodes with hardware-specific information. NFD will label the host with node-specific attributes, like PCI cards, kernel, or OS version, and many more. See (Upstream NFD) for more info.

The NFD operator can be found on the Operator Hub by searching for “Node Feature Discovery”:

After following the install steps, you can go to “Installed Operators” in the OpenShift  cluster and see:

Inside, a card instructs you to create an instance:

Click on “Create Instance” to get help from the OpenShift web console, which will auto-generate the needed YAML file and allow you to create the object from the console.

Once the NFD operator is deployed, you can go to a node dashboard and check all the Node_labels generated by the operator. Here is a sample excerpt of NFD labels applied to the node:

By reading the generated labels, you can understand hardware information of the OpenShift node; for example, we are running an amd64 architecture with multithreading enabled: (“”, “”)

Defining a BuildConfig

BuildConfig is a powerful tool in OpenShift. OpenShift Container Platform uses Kubernetes by creating containers from build images and pushing them to a container image registry.

The first step is to create a specific namespace to allocate the builds:

apiVersion: v1
kind: Namespace
	name: multiarch-build
	labels: "true"

For this example, you are pointing the builds to a repository on GitHub. First, you need to generate a secret for the generated GitHub hook:

kind: Secret
apiVersion: v1
  name: arch-dummy-github-webhook-secret
  namespace: multiarch-build
  WebHookSecretKey: bXVsdGlhcmNoLWJ1aWxk
kind: Secret
apiVersion: v1
  name: arch-dummy-generic-webhook-secret
  namespace: multiarch-build
  WebHookSecretKey: bXVsdGlhcmNoLWJ1aWxk

With the namespace and secret in place, you can now create the imageStream and BuildConfig to continuously watch for user-defined triggers to keep the image up to date. Image streams are part of the OpenShift extension APIs. Image streams are named references to container images. The OpenShift extension resources reference container images indirectly, using image streams.

The following YAML files can be generated via the OpenShift Developer web console. Once you have a generated imageStream and BuildConfig YAML, you need to make sure they look like the following:

kind: ImageStream
	app: arch-dummy
  name: arch-dummy
spec: {}
kind: BuildConfig
  name: arch-dummy
  namespace: multiarch-build
  selfLink: >-
	app: arch-dummy arch-dummy arch-dummy arch-dummy-app
  annotations: master ''
  nodeSelector: ""
  	cpu: "100m"
  	memory: "256Mi"
  	kind: ImageStreamTag
  	name: 'arch-dummy:latest'
  resources: {}
  successfulBuildsHistoryLimit: 3
  failedBuildsHistoryLimit: 3
	type: Docker
  	dockerfilePath: build/Dockerfile
  postCommit: {}
	type: Git
  	uri: ''
  	ref: master
	contextDir: /
	- type: ImageChange
  	ImageChange: {}
	- type: GitHub
      	name: arch-dummy-github-webhook-secret
	- type: ConfigChange
  runPolicy: Parallel

There are three lines worth noting in the above YAML (not auto-generated via the Developer web console), where you leverage the NFD operator labels to orchestrate the image builds on top of nodes with specific features, by using the nodeSelector key. For example, only schedule container builds on worker nodes with amd64 architecture:

  nodeSelector: ""

Now with the BuildConfig created, you can check out the GitHub URL hook: 

[eduardo@fedora-ws image_stream]$ oc describe bc/arch-dummy
Name:   	 arch-dummy
Namespace:    multiarch-build
Created:    5 days ago
Labels:   	 app=arch-dummy
Latest Version:    2

Strategy:   	 Docker
Ref:   		 master
ContextDir:   	 /
Dockerfile Path:    build/Dockerfile
Output to:   	 ImageStreamTag arch-dummy:latest

Build Run Policy:    Serial
Triggered by:   	 Config
Webhook Generic:
    AllowEnv:    false
Webhook GitHub:
Builds History Limit:
    Successful:    5
    Failed:   	 5

Build   	 Status   	 Duration    Creation Time
arch-dummy-1     complete     1m37s    	 2020-03-31 17:35:32 -0400 EDT

Events:    <none>


With this URL, you can then follow GitHub Webhook instructions for a ready-to-work imageStream..

To learn more about OpenShift Builds and more advanced use cases, you can go here.

Deploy an Example

To test what you just learned today, you can create a buildConfig of Arch-Dummy as a didactic confirmation that the feature-specific build is working. To do so, deploy the image as detailed on by selecting the built image “arch-dummy:latest”.

This image was built from the repo as seen in the imageStream.yaml.

This application generates a small API service with three endpoints:

/ -> Will retrieve information about the app

/version -> Will retrieve information about the app binary and where it was built


/ -> Will retrieve information about the app

/version -> Will retrieve information about the app binary and where it was built
{Git Commit:"6825a2f2a5b6a60278868260d8cdb51d192d9e63",CPU_arch:"Intel(R) Xeon(R) CPU E5-2686 v4 @",Built:"Tue Mar 31 21:43:15 UTC 2020",Go_version:"go1.12.8 linux/amd64"}

/cpu -> will retrieve information about the node on which the app is currently running

{name:"Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz",model:"79",family:"6"}

This dummy arch app will allow you to test whether the image was built correctly and orchestrated correctly.


Building hardware-specific images is easy by leveraging internal OpenShift tooling like imageStreams and coupling with the Node-Feature-Discovery Operator to manage the detection of hardware features and configuration in the OpenShift cluster. OpenShift simplicity allows developers to define the nodeSelector key to orchestrate image builds over target hardware. These could prove to be of great use when you consider image-build processes that involve AI/ML training requiring GPU and other special resources.

Future Work

On this blog post, you saw a quick example on how to tie together the Node-Feature-Discovery Operator and Openshift imageStreams for simple hardware-specific image builds. The following post goes deeper into OpenShift replacing the imageStream build with OpenShift Pipelines and another operator, the Special-resource-operator, to build more complex images and deploy them in the cluster.

Introducing our Newest CDF Ambassador – Zhao Xiaojie (Rick)

By Blog, Staff
This image has an empty alt attribute; its file name is pBNUl3G56Uduy8qq9Pi2IrEuBsHcSazVkg9f4d4MGiwOFDI0stRb61AQO8_GSATSwnF9b10AhMud2V1U2F2KwhB7j_qahb5jyucGpqTjErtu6ZLV6SOxcYnLlma20k66wT_kaODU

My name is Zhao Xiaojie (Rick). I’m a software engineer at Alauda, which is responsible for developing a CI/CD platform. I’m the leader of the Chinese Localization SIG and a press contact for Jenkins in China, too, where a large developer community exists invisible from the west! 

I am passionate about promoting the Jenkins community and have done so in several ways, such as running Jenkins official social media channels, encouraging people to contribute tech articles, giving speeches about Jenkins at related conferences, and maintaining the Chinese Jenkins website. 

I am also the author of several open source projects such as the Simplified Chinese Plugin, Jenkins CLI. And I have participated in Google Summer of Code (GSoC) twice as a mentor.

I’m a very active author and contributor in open source. I believe that CI/CD can speed business value shipping for all teams. Advocating CI/CD open source projects is an excellent way to help other teams and individuals adopt DevOps best practices. I enjoy giving public speeches or organizing meetups related to CI/CD. In my opinion, working with the CDF offers me a lot of opportunities to spread information about open source projects. The CDF ambassador program can help us to gather much more CI/CD fans.

You can find me on GitHub.

From Armory – Safety Is No Accident

By Blog, Member

Originally posted on the Armory blog by Chad Tripod

Continuous Delivery and Deployment is changing the way organizations deliver software. Over the years, software delivery has morphed into a time consuming process.  With countless validations and approvals to ensure the code is safe to present to users. And with good reason, releasing bad software can severely impact a business’s brand, popularity, and even revenue. In this day and age, with customer sentiment immediately feeding back into public visibility, companies are taking even stricter measures to ensure the best software delivery and user experience.

When deploying software to production, we use words like “resilience” to talk about how the code runs in the wild. For the optimists, we use words such as “Availability Zones,” and for those more pessimistic about deployments, we say, “Failure Domains.” When I was architecting and deploying applications for Apple, eBay, and others, I always built for failure. I was always more interested in how things behave when we break things, and less so on the steady state. I’d relish in unleashing tools like Simian Army to wreak havoc on what we had built to ensure code and experience weren’t impaired. 

Nowadays, there is a much better approach to ensuring safety. Continuous Delivery (CD) has enabled organizations to shift left. Empowering developers with access to deploy directly to production, but with the guardrails needed to make sure safety isn’t compromised for speed. Luckily, the world-class engineers at Netflix and Google have built a platform, Spinnaker. Spinnaker addresses deployment resiliency concerns and empowers developers with toolsets to validate and verify as a built-in part of delivering code.

Now, let’s break down the modern model and review the tools available in the CI/CD workflow. 

Spinnaker – Spinnaker is a high scale multicloud continuous delivery (CD) tool.  While leveraging the years of software delivery best practices that Netflix and Google built into Spinnaker, users get to serialize and automate all the decisions that they have baked into their current software delivery process. Approvals, environments, testing, failures, feature flagging, ticketing, etc., are all completely automated and shared across the whole organization. The end result? Built-in safety that allows DevOps teams to deploy software with great velocity.

Continuous Verification – Leveraging real-time KPIs and log messages to dictate the health of code and environment. Spinnaker’s canary deployments ingest real-time metrics from data platforms including Datadog, NewRelic, Prometheus, Splunk, and Istio into a service called Kayenta. Kayenta runs these time series metrics into the Mann Whitney algorithm developed by Netflix and Google, and compares  release metrics to current production metrics. Spinnaker will then adjust or roll back deployments automatically based on success criteria. This allows math and data, rather than manual best-guesses, to dictate in real time if the user is getting the best experience from the service.

Chaos Engineering – Why wait for things to break in production to fix them? There are better ways. Chaos Engineering is the practice of breaking things in pre prod environments to understand how the code behaves when it’s exercised. What happens when a dependent service goes offline? How do the other services in the application behave? How does Kubernetes deal with it? What about shutting off a process in a service? These are the measures Gremlin and Chaos Monkey give your developers. Now testing is much more than what your CI Server does, it takes into consideration the environments in which they are deployed. 

Service Mesh – Service Meshes are a Kubernetes traffic management solution. Kubernetes applications can traverse many clusters, regions, and even clouds. Service Meshes are a way to manage traffic flows, traceability, and most importantly with ephemeral workloads, observability. There are many flavors of service meshes to choose from. Istio/Envoy has the most visibility, but you can also implement service meshes from nginx, or even get enterprise support from companies like F5/NGINX+ or Citrix, which offer elevated ingress features. Service Meshes in the context of software delivery provide a very granular canary release. Instead of blindly sending traffic to a canary version for testing, you can instead programmatically use layer 7 traffic characteristics such as URI, host, query, path, and cookie to steer traffic. This allows you to switch only certain users, business partners, or regions to new versions of software.

DevSecOps – In my years seeing changes in technology and how we deliver software to end users, one thing is for sure: security wants to understand the risks in what you’re doing. And with good reason. Security exploits can leak sensitive information or, even expose an organization to malicious hackers. Luckily this new deployment world allows security to process their scans and validations in an automated fashion. Solutions range from TwistlockArtifactory XrayAquaSignal Science, etc. There are many DevSecOps solutions, so it is a good thing to know that Spinnaker supports them all!

Spinnaker stages automate developer tools:

End Result – As you put together your new cloud native tool chain, there are many ways you can improve the way you release software. I urge you to deploy the tools you need for the service you are providing, not only based on what a vendor is saying. Over time, implementing guardrails will increase your innovation and time to market. For many this will be a competitive advantage against those who move slowly, and investing in these areas will, over time, improve the hygiene of your software code, which will provide stability in your future releases. By de-risking the release processes and improving safety, the end users are given the best possible experience with your software.