Skip to main content
Tag

SRE

From Spinnaker ā€“ Future of SRE: Robert Keng Builds a DeploymentBot #withSpinnaker

By Blog, Project

Originally posted on the Spinnaker Community blog, by Rosalind Benoit

Coming soon from Chime to OSS, a software delivery chatbot which uses Slack to deploy apps via Spinnaker

Last month I had the pleasure of chatting with Robert Keng, a Lead SRE at Chime, about a Slack-integrated ChatBot he recently built to facilitate lightweight, direct deployments for developers. Chimeā€™s continuous delivery is based on Spinnaker, driven with signal-based GitOps. Via pipelines, merged release branches are auto-deployed from a continuous integration (CI) solution, through QA to production with no human interaction interaction.

However, it hasnā€™t always been this way; Chime has roots in a legacy build environment, largely for Ruby-on-Rails development. Itā€™s based on configuration management tools such as Salt, and thus not containerized, but pointed at long-lived infrastructure. So, containerization formed an important milestone in Chimeā€™s continuous delivery adoption. Luckily, according to Robert, its high-trust, growth minded culture and workflows have supported the evolution.

Chimeā€™s culture also provides flexibility that highlights Spinnakerā€™s power to accelerate digital transformation. Robert explains that, in some instances, it makes sense for developers to deploy straight to a test environment, bypassing CI. When adding a small feature to a mobile app, for example, I might want to bypass CI wait time to deploy and experiment with behavior (raise your hand if youā€˜ve built an app and never done thatā€¦didnā€™t think so!)

Meeting Chime devs where theyā€™re @

ā€œWeā€™re cutting the straight-to-prod patch fix deployments down to zero,ā€ Robert clarifies, and heā€™s done it by creating a flexible system with Spinnaker that models Chimeā€™s culture of trust. At any time, if the devs he enables would rather execute commands in Slack to deploy branches to environments of their choosing, they can. Robert has created a tool that allows them that agency, while empowering them to address complex use cases, for example, adding logic into the Slack commands to deploy dynamic environments into different Kubernetes clusters. In production, ā€œIf we need to scale customers on the Z-axis, and build multiple app versions with different backends to target different service providersā€ as deployment targets, with Spinnaker, Chime can. Robert points out:

ā€œSpinnaker offers a lot of agility in that respect. It would be hard to accommodate gitOps and chatOps in the same place without it.ā€

In a prime example of the opportunities to solve that Spinnaker provides as a platform, Robert has created a golden path which allows Chimeā€™s teams to iterate in a safe environment. To create it, Robert analyzed workflows as they are and designed an alternative workflow that mapped what he observed in Spinnaker. This, combined with the auto-deploy strategy, tells the story, written in pipelines, of how Chime engineers deliver software. This way, as an SRE, he can rely on automated guardrails for safety regardless of the deployment path. As Kelsey Hightower says, it ā€œserializes the culture in toolsā€ in a way thatā€™s seamless, painless, and purposefully abstracted.

Because at the end of the day, itā€™s not about the tools. Itā€™s about your story, which in Chimeā€™s case, is all about changing the way people feel about banking. What products and services do you delight your customers with? Whatā€™s your story? You can tell it #withSpinnaker

One DeploymentBot, Headed for OSS Spinnaker

The tool, in a multi-service design, has a component which handles the request/response communication with Slack, a frontend that leverages Okta user groups to control who can access Spinnaker, and a Python backend which processes the request data in batches. This architecture evolved from using webhooks to, at Armoryā€™s suggestion, using client certs for faster authentication, and from a monolith version to microservices, because of constraints encountered in the botā€™s development. The top constraint: the Slack Events APIā€™s requirement that a response from requests arising from message actions be received within 3 seconds.

This constraint presented challenges in actions like querying Vault for certificates to authenticate against Spinnaker, and even in token exchange with Slack. Breaking the chatbot into pieces allowed Robert to create a responsive, extensible service to deliver a full-featured experience for Chime devs. ā€œItā€™s turned into a monster,ā€ he grins. ā€œI have tons of feature requests for additional functionality alreadyā€ (because his devs love using it).

Next steps for Robertā€™s Bot include developing it against the entire Spinnaker API to leverage all features available, and adding more dynamic capability. He wants to enable devs to use the bot to deal with existing pipelines and executions, and adjust parameters and other configuration via a scripted payload directly from Slack.

Another important next step? Open-sourcing the DeploymentBot! Robertā€™s very busy with projects right now (read more below), but Iā€™ll hook him up with support from Armory engineers, if needed, to help get this invention to the masses.

The Future of Site Reliability, Platforms, and DevOps Engineering

As he describes his plans for the Bot, we start talking about the myth of NoOps. I have my own words about the opportunities and fallacies of Dev + Ops, but here, Robertā€™s voice speaks for itself:

ā€œMy team isnā€™t DevOps, itā€™s SRE (Site Reliability Engineering). DevOps is just part of what we do. As tech stacks mature, weā€™re seeing less dependency on direct hardware interaction, but that doesnā€™t mean the management complexity goes away; it actually gets worse. Hereā€™s an easy example: We have this awesome thing called Kubernetes. Given config maps and secrets, where is the source of truth? Ask anyone in the community, and theyā€™ll say, ā€˜Ummā€¦build it yourself!ā€™ I know Hashicorp released a sidecar method to inject values, but none of that is complete. This is why thereā€™s a lot of custom work in the community, and companies are building their own mutating webhook controllers, for example, which is what weā€™re doing. You canā€™t buy this stuff, because it doesnā€™t exist.

We have our own way of injecting Vault secrets which 100% bypasses Kubernetes stuff, because we canā€™t version it, and we canā€™t manage it from any source or truth, as itā€™s scattered across 1000 namespaces. Itā€™s impossible to manage in one place. So in our environment, we put everything in Vault, whether itā€™s configuration, or secrets. That gives us a common interface to code against. In V1, weā€™re using init containers, which is exactly what Hashicorpā€™s sidecar does. In V2, depending on the environment, weā€™ll grab values from different Vault clusters, since storing production and non-production values in the same place is just, suicide. Youā€™ll get a huge ban hammer from your security team, and no-one wants that.

So weā€™re building, and weā€™re operating it at the same time. And are developers ever going to touch these [tools]? No! There are a lot of these instances in Kubernetes where things just donā€™t exist, so what do you do?Same thing for, EC2, and ECS even. Then, moving into Knative, and Lambas, and serverless computing and functions, itā€™s even worse. Itā€™s a free-for-all. Weā€™re designing our own framework.

The next thing weā€™re looking at is building plugins that will plug in our code, and use Spinnaker to deploy it [on that infra]. I heard Armory is working on something similar for deploying Lambdas, and Iā€™m desperately waiting, because itā€™s going to make my life easier. Functions in general are kind of useless. The ecosystem around them is more important; youā€™ve got to think about API gateways, API management, queues, load balancers, etc. How do I wrap that into a sane framework where we can consistently build, integrate, test, and deploy? I donā€™t want to use 10 different ways to do the same thing. Iā€™d rather just have everything work in Spinnaker.ā€

Then when we start talking about making that happen. I tell Robert about the Community Gardening Days Iā€™m planning for Spinnaker this Spring (keep your eyes peeled! Announcement forthcoming on Spinnaker.io and social), and he gets psyched about Chimeā€™s involvement. Music to my ears!

Look out for more articles from me on the Spinnaker developer and contributor experience. Iā€™ll shine a light on the way Open Source Heroes like Robert are getting into the ecosystem as they enable the delivery of software products and services. Hang on, the latest industrial revolution (where software truly changes the freaking world for the better!) is just taking off.

Please share this on Twitter, LinkedIn, and HackerNews and give Robert some glory : )

Announcing New Course: DevOps and SRE Fundamentals-Implementing Continuous Delivery

By Announcement

SAN FRANCISCO, August 14, 2019 ā€“ The Linux Foundation, the nonprofit organization enabling mass innovation through open source, announced today that enrollment is now open for the new DevOps and SRE Fundamentals ā€“ Implementing Continuous Delivery eLearning course. The course will help an organization be more agile, deliver features rapidly, while at the same time being able to achieve non-functional requirements such as availability, reliability, scalability, security, etc. 

According to Chris Aniszczyk, CTO of the Cloud Native Computing Foundation, ā€œThe rise of cloud native computing and site reliability engineering are changing the way applications are built, tested, and deployed. The past few years have seen a shift towards having Site Reliability Engineers (SREs) on staff instead of just plain old sysadmins; building familiarity with SRE principles and continuous delivery open source projects are an excellent career investment.ā€

The open containers ecosystem with Docker and Kubernetes at the forefront is revolutionizing software delivery. Developed by Gourav Shah, founder of the School of Devops, the DevOps and SRE Fundamentals ā€“ Implementing Continuous Delivery (LFS261) course introduces learners to the fundamentals of Continuous Integration (CI) and Continuous Delivery (CD) within an open container ecosystem. The course takes a project-based approach to help learners  understand and implement key practices. 

Software Developersā€“ will learn how to deliver software safer, faster and reliably 

Quality Analystsā€“ will learn how to set up automated testing, leverage disposable environments, and integrate it with CI tools such as Jenkins and Docker

Operations Engineers, System Administrators, DevOps/SRE practitioners-will learn how to reliably deploy software and securely manage production environments.

Build and Release Engineersā€“ will learn how to deploy software safely and continuously.

DevOps and SRE Fundamentals ā€“ Implementing Continuous Delivery teaches the skills to deploy software with confidence, agility and high reliability using modern practices such as Continuous Integration and Continuous Delivery, and tools such as git, Jenkins, Docker, Kubernetes, and Spinnaker. 

This video-based course teaches the following:

  • What Continuous Integration and Continuous Delivery is and why they are needed
  • How the container ecosystem is revolutionizing software delivery and the role played by Docker and Kubernetes
  • How to use Git and GitHub for revision control and to support collaborative development
  • How to install and configure Jenkins as a Continuous Integration platform
  • How to write a pipeline-as-a-code using a declarative syntax with Jenkinsfiles
  • How to create and enforce development workflows as code reviews
  • How to standardize application packaging and distribution with Docker and Docker Registry
  • Continuous Deployment and Delivery, and how they compare with Continuous Integration
  • How to use Kubernetes to deploy applications with high availability, scalability and resilience
  • How to use Spinnaker to set up multi-cloud deployment pipelines
  • How to safely release software with Blue/Green, Highlander, and Canary release strategies.

The 2018 Open Source Jobs Report from Dice and the Linux Foundation highlighted the strong popularity of DevOps practices, along with cloud and container technologies. DevOps skills are in high demand, and DevOps jobs are among the highest paid tech jobs. This online eLearning course allows participants to be at the forefront of revolutionary technology advancements and ahead of the learning curve. 

DevOps and SRE Fundamentals ā€“ Implementing Continuous Delivery is available for $299. Visit here to learn more details.

About The Linux Foundation

The Linux Foundation is the organization of choice for the worldā€™s top developers and companies to build ecosystems that accelerate open technology development and industry adoption. Together with the worldwide open source community, it is solving the hardest technology problems by creating the largest shared technology investment in history. Founded in 2000, The Linux Foundation today provides tools, training and events to scale any open source project, which together deliver an economic impact not achievable by any one company. More information can be found at www.linuxfoundation.org.

The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our trademark usage page: https://www.linuxfoundation.org/trademark-usage.

Linux is a registered trademark of Linus Torvalds.