Originally posted on the Armory blog by Rosalind Benoit
I met Edgar Magana at Spinnaker Summit last year, when he spoke during Armory’s keynote as one of six Spinnaker champions. The energy and enthusiasm he brings to advocating for Spinnaker contrasts his intensity in approaching his role as operator of mega-scale cloud infrastructure. But, the more I get to know him, the more I understand that it’s one in the same. To enable Salesforce’s application owners to safely evolve software, he must ensure homogenous, predictable models for continuous delivery. Spinnaker has helped him make that a reality.
“Across multiple environments, we have to enable different models for Spinnaker based on security requirements,” he explains as he shares his standardization strategy.
“We templatize all the pipelines for consistency across services, with two types: EC2 instance deployments, and Kubernetes cluster pipelines. For Kubernetes, we require a lot of security hardening, and we need to use the same logging and monitoring mechanisms for all of our clusters.”
Every time Edgar’s team discovers a new configuration for Kubernetes, they upgrade that pipeline, which should trigger a service owner to relaunch the pipeline with new parameters, or sometimes, destroy their Kubernetes cluster and create a new one. “We make all these changes in development and staging first, of course,” he says, noting the dev, pre-prod, and prod Spinnaker instances his team maintains.
A security requirement that artifacts not be created in the same place that deploys services imposes an added complexity. It required Edgar’s team to innovate further, and split the baking process across two different Spinnaker instances. Luckily, they could configure this out of the box with Spinnaker by overriding parameters; Edgar appreciates that Spinnaker doesn’t hard-code a lot of configuration, as rigidity wouldn’t support Salesforce’s unique requirements. That has also allowed his team to create a “heavy layer of automation on top of Spinnaker,” providing guardrails for application owners.
Our conversation turns to the Spinnaker Ops SIG (Special Interest Group), which Edgar recently founded. A solid kickoff meeting produced several action items to be completed before the next scheduled meeting on February 27th at 10 AM, (always a good sign).
Most importantly at this stage, Edgar says:
“We want to reach out to more operators, people who are either struggling or evaluating Spinnaker. We need operators in different stages — super experts who control everything, like those at Netflix and Airbnb, operators that are getting there, like us at Salesforce, and those in the initial evaluation stage. The goal of the SIG is to have a place where operators can exchange use cases, and have a unified voice, just like other Spinnaker SIGs, and a path for specific features we want incorporated into Spinnaker. The community needed a place to discuss how to operate Spinnaker better. As an operator of large-scale infrastructure, I don’t want to share this system with only a few companies. We want to welcome new users and operators, and facilitate their transition from the POC (proof-of-concept) environment to the real thing. This will help us understand what kinds of features are more important.”
Does the Ops SIG also provide a place to vent and empathize? I sure hope so! “That’s the life of a cloud operations architect,” Edgar says when he has to reschedule our meeting, “we get called all the time, from account issues, to Spinnaker and Kubernetes configuration,” and lots more; indeed, once when I ping Edgar about this blog, he’s in a “war room” (boy, I sure don’t miss my Ops days right now!) But just like Armory, Edgar values empowering developers, and safely pushing control of applications and their infrastructure to the edge to fuel innovation. Better software is worth the hard work!
Another of Edgar’s goals for the Ops SIG: create reference architecture documentation for HA (high availability) and disaster recovery. “I want new users of Spinnaker to say, ‘I don’t need to reinvent the wheel; I’ll just follow these HA guidelines.” Architecture collateral will help Platforms and DevOps teams convince leadership that Spinnaker is a good investment for the company’s continuous delivery of software. This is where Edgar’s warm enthusiasm and operator’s intensity meet: empowering developers, empowering the community, empowering the planet.
I look forward to working more with Edgar and his team at Salesforce as part of the Ops SIG, our April Spinnaker Gardening Days online hackathon, and more. This is the kind of open-source heroism that will usher in the new industrial revolution!
With Spinnaker Operator, define all the configurations of Spinnaker in native Kubernetes manifest files, as part of the Kubernetes kind “SpinnakerService” defined in its own Custom Resource Definition (CRD). With this approach, you can customize, save, deploy and generally manage Spinnaker configurations in a standard Kubernetes workflow for managing manifests. No need to learn a new CLI like Halyard, or worry about how to run that service.
Spinnaker is automatically exposed with Kubernetes service load balancers (optional).
Experimental: Accounts can be provisioned and validated individually by using a different SpinnakerAccount manifest, so that adding new accounts involves creating a new manifest instead of having everything in a single manifest.
Let’s look at an example workflow.
Assuming you have stored SpinnakerService manifests under source control, you have a pipeline in Spinnaker to apply these manifests automatically on source control pushes (Spinnaker deploying Spinnaker) and you want to add a new Kubernetes account:
Save the kubeconfig file of the new account in a Kubernetes secret, in the same namespace where Spinnaker is installed.
Checkout Spinnaker config repository from source control.
Add a new Kubernetes account to the SpinnakerService manifest file, referencing the kubeconfig in the secret:
At Armory, we are intensely focused on building our culture, not just building our product. Our culture is the operating system of our company, underpinning and supporting everything that we do.
From the get-go, we decided that Armory’s culture was going to be designed with intentionality to be remote work friendly. Armory is built on Spinnaker, an open source project created by Netflix and Google. One of the incredible features of open source software is that it is built through the collaboration of talented individuals and teams distributed all over the world. To align with the vibrant Spinnaker open source community, we doubled down on building a strong remote work culture. Today, more than half of our company works remotely, with the remainder working at our HQ in San Mateo, CA.
What Does this Mean in Practice?
For many companies, the experience of remote workers is a secondary concern, if it is even thought of at all. But remote workers face their own unique experiences, benefits, and challenges. At Armory, we acknowledge and welcome this unique experience, approach it with empathy and understanding, and create feedback mechanisms to ensure that our on-site and remote tribals are all having the best possible experience.
Some of the things we do at Armory include:
Set up Zoom stations all over the office to make it seamless to hop on a video chat and have synchronous, “face-to-face” communication
This includes the always-on camera and screen in the main section of our office so that we can all eat lunch “together”
Rotate meeting leadership between on-site and remote tribals, to continually ensure that the needs of remote tribals are being met
Provide a flexible expense policy for remote workers to ensure that they have the best, most productive remote setup for their individual needs
Organize frequent in-person events (team and company-wide offsites, holiday parties, the Spinnaker Summit, and others) to provide everyone with the opportunity for in-person relationship building
Default to having conversations in a public Slack channel to ensure that everyone can participate, voice their perspectives, and share in the knowledge
Create unstructured time for “high-bandwidth” communication (over Zoom and in-person) for people to get to know each other outside of a business context
Specifically within the engineering team, many of the scrum teams dedicate a chunk of their daily standup time to the type of general team syncing and catch-up that happens more naturally in-person in the office
A high-risk moment for Armory! Most of the engineering team in one single elevator, on the way to dinner during a 3-day offsite. Everything turned out fine, and dinner was delicious.
Why Do We Invest in the Remote Experience?
All of these initiatives take time and effort. Why do we bother, instead of leaving it up to remote tribals to conform to the working cadence of the tribals who are at HQ?
One simple answer is that it comes down to people. We want to work with the best people all over the world, not just the best people in the Bay Area. And that means embracing remote work and maximizing the remote work experience.
For me personally, my main role as the Head of Engineering is to empower the engineering team. That means creating the visibility and shared context so that engineers have the information that they need to make great decisions to positively impact the company and the Spinnaker community. Furthermore, continual growth and improvement is a core value at Armory. It is imperative that each of us continue to learn and grow, not just in our technical depth but also in understanding how each other work, how we can function as a better, more cohesive team, and how we can strive together for larger goals.
If I am not investing in the right tools, processes, and culture to foster that shared context and continual growth across the entire engineering team, then I am not doing my job.
Armory is hiring polyglot engineers, as well as tribals across the entire organization. Check out open roles and learn more about what life is like at Armory at armory.io/careers. We’d love to hear from you!
Google Summer of Code is much more than a summer internship program, it is a year-round effort for the organization and some community members. Now, after the DevOps World | Jenkins World conference in Lisbon and final retrospective meetings, we can say that GSoC 2019 is officially over. We would like to start by thanking all participants: students, mentors, subject matter experts and all other contributors who proposed project ideas, participated in student selection, in community bonding and in further discussions and reviews. Google Summer of Code is a major effort which would not be possible without the active participation of the Jenkins community.
In this blogpost we would like to share the results and our experience from the previous year.
In the following sections, we present a brief summary of each project, links to the coding phase 3 presentations, and to the final products.
Role Strategy Plugin Performance Improvements
Role Strategy Plugin is one of the most widely used authorization plugins for Jenkins, but it has never been famous for performance due to architecture issues and regular expression checks for project roles. Abhyudaya Sharma was working on this project together with hist mentors: Oleg Nenashev, Runze Xia and Supun Wanniarachchi. He started the project from creating a new Micro-benchmarking Framework for Jenkins Plugins based on JMH, created benchmarks and achieved a 3501% improvement on some real-world scenarios. Then he went further and created a new Folder-based Authorization Strategy Plugin which offers even better performance for Jenkins instances where permissions are scoped to folders. During his project Abhyudaya also fixed the Jenkins Configuration-as-Code support in Role Strategy and contributed several improvements and fixes to the JCasC Plugin itself.
Jenkins UI and frontend framework are a common topic in the Jenkins project, especially in recent months after the new UX SIG was established. Jack Shen was working on exploring new ways to build Jenkins Web UI together with his mentor Jeff Pearce. Jack updated the Working Hours Plugin to use UI controls provided by standard React libraries. Then he documented his experienced and created template for plugins with React-based UI.
Remoting over Apache Kafka with Kubernetes features
Long Le Vu Nguyen was working on extended Kubernetes support in the Remoting over Apache Kafka Plugin. His mentors were Andrey Falco and Pham vu Tuan who was our GSoC 2018 student and the plugin creator. During this project Long has added a new agent launcher which provisions Jenkins agents in Kubernetes and connects them to the master. He also created a Cloud API implementation for it and a new Helm chart which can provision Jenkins as entire system in Kubernetes, with Apache Kafka enabled by default. All these features were released in Remoting over Apache Kafka Plugin 2.0.
Running the GSoC program at our organization level
Here are some of the things our organization did before and during GSoC behind the scenes. To prepare for the influx of students, we updated all our GSoC pages and wrote down all the knowledge we accumulated over the years of running the program. We started preparing in October 2018, long before the official start of the program. The main objective was to address the feedback we got during GSoC 2018 retrospectives.
Project ideas. We started gathering project ideas in the last months of 2018. We prepared a list of project ideas in a Google doc, and we tracked ownership of each project in a table of that document. Each project idea was further elaborated in its own Google doc. We find that when projects get complicated during the definition phase, perhaps they are really too complicated and should not be done.
Since we wanted all the project ideas to be documented the same way, we created a template to guide the contributors. Most of the project idea documents were written by org admins or mentors, but occasionally a student proposed a genuine idea. We also captured contact information in that document such as GitHub and Gitter handles, and a preliminary list of potential mentors for the project. We embedded all the project documents on our website.
Mentor and student guidelines. We updated the mentor information page with details on what we expect mentors to do during the program, including the number of hours that are expected from mentors, and we even have a section on preventing conflict of interest. When we recruit mentors, we point them to the mentor information page.
We also updated the student information page. We find this is a huge time saver as every student contacting us has the same questions about joining and participating in the program. Instead of re-explaining the program each time, we send them a link to those pages.
Application phase. Students started to reach out very early on as well, many weeks before GSoC officially started. This was very motivating. Some students even started to work on project ideas before the official start of the program.
Project selection. This year the org admin team had some very difficult decisions to make. With lots of students, lots of projects and lots of mentors, we had to request the right number of slots and try to match the projects with the most chances of success. We were trying to form mentor teams at the same time as we were requesting the number of slots, and it was hard to get responses from all mentors in time for the deadline. Finally we requested fewer slots than we could have filled. When we request slots, we submit two numbers: a minimum and a maximum. The GSoC guide states that:
The minimum is based on the projects that are so amazing they really want to see these projects occur over the summer,
and the maximum number should be the number of solid and amazing projects they wish to mentor over the summer.
We were awarded minimum. So we had to make very hard decisions: we had to decide between “amazing” and “solid” proposals. For some proposals, the very outstanding ones, it’s easy. But for the others, it’s hard. We know we cannot make the perfect decision, and by experience, we know that some students or some mentors will not be able to complete the program due to uncontrollable life events, even for the outstanding proposals. So we have to make the best decision knowing that some of our choices won’t complete the program.
Community Bonding. We have found that the community bonding phase was crucial to the success of each project. Usually projects that don’t do well during community bonding have difficulties later on. In order to get students involved in the community better, almost all projects were handled under the umbrella of Special Interest Groups so that there were more stakeholders and communications.
Communications. Every year we have students who contact mentors via personal messages. Students, if you are reading this, please do NOT send us personal messages about the projects, you will not receive any preferential treatment. Obviously, in open source we want all discussions to be public, so students have to be reminded of that regularly. In 2019 we are using Gitter chat for most communications, but from an admin point of view this is more fragmented than mailing lists. It is also harder to search. Chat rooms are very convenient because they are focused, but from an admin point of view, the lack of threads in Gitter makes it hard to get an overview. Gitter threads were added recently (Nov 2019) but do not yet work well on Android and iOS. We adopted Zoom Meetings towards the end of the program and we are finding it easier to work with than Google Hangouts.
Status tracking. Another thing that was hard was to get an overview of how all the projects were doing once they were running. We made extensive use of Google sheets to track lists of projects and participants during the program to rank projects and to track statuses of project phases (community bonding, coding, etc.). It is a challenge to keep these sheets up to date, as each project involves several people and several links. We have found it time consuming and a bit hard to keep these sheets up to date, accurate and complete, especially up until the start of the coding phase.
Perhaps some kind of objective tracking tool would help. We used Jenkins Jira for tracking projects, with each phase representing a separate sprint. It helped a lot for successful projects. In our organization, we try to get everyone to beat the deadlines by a couple of days, because we know that there might be events such as power outages, bad weather (happens even in Seattle!), or other uncontrolled interruptions, that might interfere with submitting project data. We also know that when deadlines coincide with weekends, there is a risk that people may forget.
Retrospective. At the end of our project, we also held a retrospective and captured some ideas for the future. You can find the notes here. We already addressed the most important comments in our documentation and project ideas for the next year.
Last year, we wanted to thank everyone who participated in the program by sending swag. This year, we collected all the mailing addresses we could and sent to everyone we could the 15-year Jenkins special edition T-shirt, and some stickers. This was a great feel good moment. I want to personally thank Alyssa Tong her help on setting aside the t-shirt and stickers.
Each year Google invites two or more mentors from each organization to the Google Summer of Code Mentor Summit. At this event, hundreds of open-source project maintainers and mentors meet together and have unconference sessions targeting GSoC, community management and various tools. This year the summit was held in Munich, and we sent Marky Jackson and Oleg Nenashev as representatives there.
Apart from discussing projects and sharing chocolate, we also presented Jenkins there, conducted a lightning talk and hosted the unconference session about automation bots for GitHub. We did not make a team photo there, so try to find Oleg and Marky on this photo:
GSoC Team at DevOps World | Jenkins World
We traditionally use GSoC organization payments and travel grants to sponsor student trips to major Jenkins-related events. This year four students traveled to the DevOps World | Jenkins World conferences in San-Francisco and Lisbon. Students presented their projects at the community booth and at the contributor summits, and their presentations got a lot of traction in the community!
Thanks a lot to Google and CloudBees who made these trips possible. You can find a travel report from Natasha Stopa here, more travel reports are coming soon.
This year, five projects were successfully completed. We find this to be normal and in line with what we hear from other participating organizations.
Taking the time early to update our GSoC pages saved us a lot of time later because we did not have to repeat all the information every time someone contacted us. We find that keeping track of all the mentors, the students, the projects, and the meta information is a necessary but time consuming task. We wish we had a tool to help us do that. Coordinating meetings and reminding participants of what needs to be accomplished for deadlines is part of the cheerleading aspect of GSoC, we need to keep doing this.
Lastly, I want to thank again all participants, we could not do this without you. Each year we are impressed by the students who do great work and bring great contributions to the Jenkins community.
Yes, there will be Google Summer of Code 2020! We plan to participate, and we are looking for project ideas, mentors and students. Jenkins GSoC pages have been already updated towards the next year, and we invite everybody interested to join us next year!
Tracy Ragan re-elected to serve a 2nd year on the Continuous Delivery Foundation Governing Board
Santa Fe, NM – April 17, 2020– DeployHub, creators of the first microservice management platform, today announced that the Continuous Delivery Foundation Board (CDF) has re-elected Tracy Ragan as the General Membership Board Representative.
“It has been an honor to serve as the General Member Representative for the CDF over the last year,” said Tracy Ragan. “This is an area that I have devoted my entire career to. To have the opportunity to work with other member companies who really get this space has been an amazing community experience. I promise to continue working as hard this year as I did last.”
Moving to microservices breaks the way we assemble and configure software. DeployHub puts it back together by providing a central ‘hub’ for cataloging, versioning, sharing and releasing microservices across the organization. DeployHub empowers your high performing software engineers to easily move from monolithic to microservices. For more information on DeployHub, go to www.DeployHub.com
About the CD Foundation
The Continuous Delivery Foundation (CDF) serves as the vendor-neutral home of many of the fastest-growing projects for continuous integration/continuous delivery (CI/CD). It fosters vendor-neutral collaboration between the industry’s top developers, end users and vendors to further CI/CD best practices and industry specifications. Its mission is to grow and sustain projects that are part of the broad and growing continuous delivery ecosystem.
DeployHub is a registered trademark of DeployHub, Inc. All other trademarks used in this document are the property of their respective owners.
We are excited to announce that Rosalind Benoit has been elected the CD Foundation Outreach Committee Chairperson.
Rosalind is Director of Community at Armory, where she works to enable and energize the Spinnaker ecosystem. Rosalind holds an MSIS in Database & Internet Technologies from Northwestern University. Her passion for enacting change via software comes from a varied background in system administration, development, project management, and education, along with a lifelong love of Linux. She makes and facilitates Spinnaker contributions that improve the developer experience and share the secrets of the optimized Software Development Life Cycle (SDLC).
The Outreach Committee is responsible for the overall marketing and outreach for CDF projects, ultimately managing and guiding CDF marketing for the Governing Board. Rosalind’s election to the chairperson role is a recognition of her substantial contributions to the marketing of Spinnaker and CDF community efforts.
Rosalind Benoit said,
“I’m thrilled to be elected as Outreach Committee Chair. We have an amazing opportunity to make CDF projects stand out in the industry. I look forward to working with the rest of the CDF community this year! I’d like to thank Alyssa Tong for her hard work as the CDF Outreach Chair and look forward to this coming year!”
As the Outreach chairperson, Rosalind will continue to be a strong voice representing the perspectives of the broader CDF community, especially to the governing board. The CDF is excited to see her continue to help make CDF the definitive destination for the continuous delivery ecosystem.
Continuous Delivery Foundation (CDF) Technical Oversight Committee (TOC) approved the formation of Special Interest Group (SIG) Interoperability January 14, 2020. SIG Interoperability aims to increase integration and interoperability across different tools and technologies in the open source CI/CD ecosystem. One of the prerequisites to achieve this is to provide a neutral forum, enabling dialog between projects and end-users so they can come together and discuss their use cases, needs, and challenges. This will allow projects and communities to explore additional collaboration opportunities and increase the visibility of ongoing work.
One of the means the SIG adapted to provide a forum for discussion is to invite representatives of project and end-user communities to regular SIG meetings so they can present what they are doing. The presentations are then followed by open discussions which allows community members to ask questions, raise concerns, and more importantly start talking with each other. However, one of the things the community noticed is the lack of shared terminology and vocabulary as the tools and technologies employ different terms to describe what is often the same thing.
This is actually not a surprising finding since there are many ways to greet someone and as humans if we do not understand the word being used we have the ability to observe body language, process tone, and even touch. These many different natural inputs allow us as humans to establish shared vocabulary upon which we have been able to build successful components relevant to our way of living and social norms of interacting.
Unfortunately for machines, this process is not so easy as we humans have to decide if we want to establish norms which we often surface when talking about machine interactions as protocols and best practices or requirements.
Continuous Integration (CI) and Continuous Delivery (CD) practitioners have many tools at their disposal but it is often the case that what we call a pipeline in today’s tool of choice is not called the same thing in the tool we use tomorrow. Again, we can within our sphere of influence and interaction adjust for these nuances but machines talking to one another do not have that same luxury necessarily.
These are the thoughts that made contributors to SIG to work on vocabulary and terminology as the first thing right after the SIG was approved to be formed because we believe that if we can establish a shared vocabulary across the industry in CI/CD domain, we can remove the barriers between humans so we can start tackling with getting machines to talk to each other. The way this work is envisioned to be done is to collect the existing terms used by CI/CD tools and technologies in a document, and create a mapping of the terms across projects, essentially making the Rosetta Stone for CI/CD domain. We think that we can continue on this work and look for possibilities to come up with shared vocabulary in a collaborative manner.
The document SIG is working on is available in SIG Interoperability repository on GitHub and it currently contains terms for 10 CI/CD projects as shown on the table below.
Due to the fact that when organizations establish CI/CD pipelines, they employ not just CI/CD tools but also Software Configuration Management (SCM) systems, Artifact Repository Managers (ARM) and so on. That’s why we included terminology for SCM tools such as Gerrit, GitHub, and GitLab and we expect to have terms used by other tools in adjacent areas collected as well.
It is important to highlight that we consider this work as still ongoing and we encourage and welcome everyone to add terminology used by the project they use and/or are involved in to the document so we have broader coverage of the tools and technologies. If you also notice that there are things that can be improved, feel free to send a pull request to CDF SIG Interoperability repository and improve the existing documentation.
Jenkins X is an automated CI/CD platform built on Kubernetes. Jenkins X enables users to harness the power of Kubernetes without needing to be Kubernetes experts. How does a CI/CD platform do this? Jenkins X forms an abstraction layer over Kubernetes, simplifying the developer experience of building, deploying, and running Kubernetes applications. Under the hood, Jenkins X combines best-of-breed open source tools, creating a Kubernetes-native CI/CD platform that facilitates developer and GitOps best practices.
In this post, we’ll look at how Jenkins X uses Kubernetes Custom Resource Definitions (CRDs) and the Kubernetes API to bring together these best-of-breed open source projects, creating a cutting edge continuous delivery platform on Kubernetes. We’ll highlight two Kubernetes design principles that help us understand how Jenkins X natively extends Kubernetes:
Kubernetes API is declarative
Kubernetes has no hidden APIs
Kubernetes itself is decomposed into multiple components which interact through the Kubernetes API. Kubernetes’ declarative, API driven infrastructure enables it to be composable and extensible.
Kubernetes API is declarative
The Kubernetes API is declarative rather than imperative: as a user, you declare the desired state of your application and the Kubernetes system drives to make it so. One important benefit of this is automatic recovery. If something happens to your application, for example, a node crashes, then Kubernetes will restore the desired state.
Kubernetes has no hidden APIs
The Kubernetes API is exposed by the Kubernetes API server, which is a component of the Kubernetes control plane. The Kubernetes control plane is transparent in that there are no hidden internal APIs in Kubernetes: Kubernetes components interact through the same API that Kubernetes exposes to its users.
A declarative, API driven infrastructure
Kubernetes’ declarative, API driven infrastructure means that components, such as nodes, talk to the Kubernetes API server to figure out what their state ought to be. Instead of having the decision centralised and sent out, each node is responsible for its own health, and figuring out its desired behaviour. If a node fails and is brought back up, the newly created node can query the API server to figure out what it’s supposed to do.
The declarative way the Kubernetes API server communicates with remote nodes is in contrast to traditional client – server relationships, where the client tells the server what to do in an imperative manner and the server does it. Building the Kubernetes API server this way would have meant it grew as more functionality was added; the API server would have been brittle and difficult to extend.
Kubernetes is using a pattern called level triggered, which is generally opposed to edge triggered. In edge triggered systems the system responds to events, but if the system doesn’t receive an event, then the event needs to be replayed for the system to recover.
“If you are edge triggered you run risk of compromising your state and never being able to re-create the state. If you are level triggered the pattern is very forgiving, and allows room for components not behaving as they should to be rectified. This is what makes Kubernetes work so well.”
In Kubernetes, if any component goes down, when it comes back up, it requests the desired state from the Kubernetes API server and works to match that state. Components that can recover in this way tend to be more robust and the overall system is more reliable. This is especially true in distributed systems, where there are so many components in the system that the expectation is that there will always be components failing. Distributed systems need to be designed to tolerate the failure of components. If your system has one central manager component, which tells all the parts of the system what they should be doing, and that central manager component goes down, your system is down. Distributing that responsibility, so every component can figure out what it should be doing, makes the system more reliable. No longer is there a single point of failure.
What happens when the Kubernetes API server, which acts as a central point, goes down? All the components will continue to operate on the last information they received. When the API server comes back up, the components will then operate on the new state if there were any changes. If any of the components go down, the other components can continue to function independently of that failure. When failed components come back up, they can read the state they should work towards from the API server.
These design choices make Kubernetes reliable. They also make Kubernetes very composable and extensible. Because all components use the same Kubernetes API as you do as an end user, you can replace any default component with your own. You can also add new components to enable new functionality. This extensibility has helped create a vibrant ecosystem of Kubernetes-native open source projects that like Jenkins X are built on Kubernetes using Kubernetes resources and the Kubernetes API machinery.
Custom Resource Definitions (CRDs)
Kubernetes is extended through Custom Resource Definitions (CRDs). A Kubernetes resource is an endpoint in the Kubernetes API that stores API objects of a certain type. Kubernetes uses API objects to represent the state of your cluster.
To create your own custom Kubernetes API object type, define a new CRD of your type and define the schema. Then you can create your own objects against the Kubernetes API server. In this way, a custom resource extends the Kubernetes API: creating CRDs is like embedding your own APIs inside Kubernetes itself. To use the custom API objects you have created, you write your own custom controllers that act on the data contained in your custom object types. Kubernetes controllers are the mechanism by which Kubernetes reconciles the state state of your cluster to the state declared in the Kubernetes API.
How do CRDs relate to Kubernetes built-in types? Tim Hockin, co-founder of the Kubernetes project, has said, “If we had CRDs on day zero of Kubernetes there would be no built-in types.” If CRDs had existed from the start, pods and nodes and everything else would also be a CRD!
If they weren’t part of the original design, why were CRDs created? CRDs were first created as a way to extend Kubernetes functionality to enable rapid prototyping.
“That’s what fascinates me about CRD. It started as a prototyping tool. K8s API machinery was not intended to be a framework, but that is what shook out. If we did that intentionally we would have messed it up.”
It’s extremely interesting that CRDs, which started as a prototyping mechanism, are now the main resource definition mechanism in Kubernetes. This enables Kubernetes to be more modular, and many core Kubernetes functions are now built using custom resources.
The Kubernetes API machinery is now distilled such that it can be used as API machinery for any project, not just Kubernetes. The extensible nature of the Kubernetes API enables higher level applications and platforms to be built on Kubernetes. Jenkins X runs directly on Kubernetes, uses the Kubernetes API, and defines CRDs for its workflow. Moreover, the same Kubernetes API machinery that makes Kubernetes extensible also enables Kubernetes-native applications to integrate well with each other. Jenkins X both creates its own CRDs and integrates with other Kubernetes-native applications through the Kubernetes API to form a Kubernetes-native CI/CD platform.
Jenkins X High Level Architecture:
As seen in the diagram above, Jenkins X integrates with a number of open source projects such as Tekton, Prow, and Vault, among others, to create an automated Kubernetes-native CI/CD platform. Jenkins X relies on CRDs to create new resources and extend the Kubernetes API. The Kubernetes API machinery enables Jenkins X to integrate with other open source projects through the Kubernetes API server.
Tekton, the pipeline execution engine for Jenkins X
Tekton is the pipeline execution engine for Jenkins X. Like Jenkins X, Tekton is Kubernetes-native and extends Kubernetes using CRDs. Jenkins X leverages Prow, or Jenkins X’s own Lighthouse, to signal to Tekton to run builds. Lighthouse is a lightweight webhook handler, which listens for Git webhook events and uses them to trigger Tekton PipelineRun CRDs for Tekton to use to perform builds. Tekton then generates a status update which Jenkins X communicates back to source code management providers, such as GitHub.
The integration between Jenkins X as a CI/CD platform and Tekton as the execution engine for Jenkins X happens within Kubernetes using CRDs and the Kubernetes API. That both projects are Kubernetes-native enables them to seamlessly integrate using the Kubernetes API machinery.
“Tekton Pipelines lets us power Jenkins X’s execution and management of pipelines natively within Kubernetes.”
– Andrew Bayer, Software Engineer, CloudBees, and creator of Jenkins X Pipeline Syntax
Tekton is a project that evolved from an internal Google tool that used Knative to build and deploy software. In 2018, it was spun out as an independent project and donated to the Continuous Delivery Foundation.
The core component, Tekton Pipelines, runs as a controller in a Kubernetes cluster. It registers several custom resource definitions which represent the basic Tekton objects with the Kubernetes API server, so the cluster knows to delegate requests containing those objects to Tekton. These primitives are fundamental to the way Tekton works. Tekton’s building block approach starts with the smallest atom of work, the Step, aggregates Steps together in Tasks, and aggregates Tasks together in Pipelines.
* *Task*: is a collection of sequential steps you would want to run as part of your continuous integration flow. A task will run inside a pod on your cluster.
* *ClusterTask*: Similar to Task, but with a cluster scope.
* *Pipeline*: stateless, reusable, parameterized collection of tasks. Tasks are linked together in a Pipeline, which describes the end-to-end deployment for an application.
* *PipelineRun*: an instantiation of a Pipeline definition, filling in the Pipeline’s parameters with concrete values
* *Pipeline Resource*: objects that will be input to or output from tasks
* *Trigger*: Triggers is a Kubernetes Custom Resource Definition (CRD) controller that allows you to extract information from event payloads (a “trigger”) to create Kubernetes resources.
Notable omissions from the CRD list are “Steps”, which don’t have their own CRD because they’re the smallest unit of execution which are always contained inside a Task. The Conditions and Dashboard Extension CRDs are still optional and experimental — but very exciting!
Tekton’s approach is particularly interesting from a tool interoperability standpoint. By focusing on these building blocks and the concrete representation of them as declarative configuration, Tekton creates a standard platform for CD in the same way that Kubernetes provides a platform for application runtimes. This allows user-facing tools to build on the platform rather than reinventing these primitives. Several projects have already taken up this approach:
* Jenkins X uses Tekton as its execution engine. It’s been an option for a while now, but recently the project announced it was moving to using Tekton exclusively. Jenkins X provides pipeline definitions and gitops workflows that are tailored for cloud-native CD.
* Kabanero is a project that enables teams to develop and deploy applications on Kubernetes, so architects can provide pre-approved application stacks for developers to work from. It uses Tekton Pipelines and several associated projects like Tekton Dashboard and Triggers; indeed the developers building the Dashboard are largely working on Kabanero and the IBM Cloud Devops Pipeline product.
* Relay by Puppet is a hosted service that uses Tekton as the execution engine for event-triggered devops and deployment workflows. (Full disclosure, this is the product I am working on!) It provides a YAML dialect for building workflows that can be triggered by external events, via API, or manually, to automate tasks that need to stitch together different tools and services.
* TriggerMesh have integrated Tekton Pipelines into their TriggerMesh Cloud project and are working on a tool called Aktion to translate Github Actions into Tekton Pipelines.
* There are more, too! Check out the Tekton Friends repo for a longer list of projects and end users building on Tekton.
As exciting as this activity is, I think it’s important to note there’s still a lot of work to be done. There’s a distinct difference between two projects both using Tekton as a common upstream platform and achieving interoperability between them! It’s a big problem and it’s easy to get overwhelmed with the magnitude of the whole thing. One of my earliest lessons when I moved from SRE into product management was: focus first on solving the pain points which end users feel most acutely. That can be some combination of pervasiveness (what percent of the overall user base feels it?) and severity (how bad is each individual incident?) – ideally, fix the thing which is worst on both axes! From an end user’s standpoint, CD tools have a pretty steep learning curve with a bunch of pitfalls. A sampling of these severe-and-pervasive pitfalls I’ve heard from our users as we’ve been building Relay:
* How do I wrap my head around the terminology and technology so I can get started?
* How do I integrate the parts of the build/test/deploy toolchain my organization needs to continue using?
* How do I operate (upgrade, monitor, troubleshoot) the tool once it’s up and running?
Interoperability isn’t a cure-all, but there are definitely areas where it could work like a soothing balm on all of this pain. Industry-standard terminology or at a minimum, an authoritative Rosetta Stone for CD, could help. At the moment, there’s still pockets of debate on whether the “D” stands for Deployment or Delivery! (It’s “Delivery”, folks – when you mean “Deployment” you have to spell it out.)
Going deeper, it’d be hugely helpful help users integrate the tools they’re already using into a new framework. A wide ecosystem of steps that could be used by any of the containerized CD tools – not just those based on Tekton but, for example, Spinnaker and Keptn as well – would have a number of benefits. For end users, it would increase the amount of content available “out of the box”, meaning they would have less work to integrate the tools and services they need. Ideally, no end-user should have to create a step from scratch because there’s a vast, easily discoverable library of things that accomplish the job they have. There’s also a benefit to maintainers of services and tools that end-users want, like Kaniko, Gradle, and the cloud services, who have to build an integration with each execution framework themselves or rely on the community to do it. Building and maintaining one reusable implementation would reduce the maintenance burden and allow them to provide higher quality.
To put on my Tekton advocate hat for a moment, its well-defined container contract makes it easy to use general-purpose containers in your pipeline. If you want to take advantage of more specialized features the framework provides, the Tekton Catalog has a number of high-quality examples to build from. There are improvements on the way to aid the discoverability and reuse parts of the problem, such as the exciting new Tekton Hub donated by Red Hat.
The operability concerns are a real problem for CD pipeline tools, too. Although CD is usually associated with development, in many organizations the tool itself is considered a production service, because if there are problems committing, building, testing, and shipping code, the engineering organization isn’t delivering value. Troubleshooting byzantine failures in complex CI/CD pipelines is a specialized discipline requiring skills that span Quality Engineering, SRE, and Development. The more resilient the CD tools are architected, and the more standard their interfaces for reporting availability and performance metrics, the easier that troubleshooting becomes.
Again, to address these from Tekton’s perspective, a huge benefit of running on Kubernetes is that the Tekton services that run in the cluster can take advantage of all the powerful k8s operability features. So fundamental capabilities that are highly valuable to operators and troubleshooters like log aggregation, in-place upgrades, error reporting, and scale-out all ride on top of the Kubernetes infrastructure. It’s not “for free” of course; nothing in distributed systems is ever truly “for free” and if anyone tries to tell you otherwise, the thing they’re selling you is probably *very* expensive. But it does mean that general-purpose Kubernetes skills and tooling goes a long way towards operating Tekton at scale, rather than having to relearn or reimplement them at the application layer.
In conclusion, I’m excited that the interoperability conversation is well underway at the CDF. There’s a long way to go, but the amount of activity and progress in the space is very encouraging. If you’re interested in pitching in to discuss and solve these kinds of problems, please feel free to join in #sig-interoperability channel on the CDF slack or check out the contribution information.
I’m also a Jenkins Ambassador and DevOps Institute Ambassador, too. I’ve organized the Jenkins Area Meetup and Jenkins User Conference China for 3 years. It was an honorable moment to win the Most Valuable Advocate of Jenkins community and to be Jenkins Ambassador in 2018, as well.
My story with DevOps started from an email sent by my boss in 2014. He said, “make a study of DevOps.” And so it began.
I found an internal community in our company where architects, developers, testers, and ops could meet together to understand and learn from each other.
And I didn’t forget the work assigned by my boss . I’ve led from start to release an internal DevOps Guide to help all teams to practice DevOps.
2017 will be a memorable year for me. DevOpsDays Beijing 2017 has lit up DevOps in China. Lots of companies shared their experience about DevOps, such as Alibaba, Tencent, Baidu, Huawei, etc.
From 2017, I also started to be a full-time member of the community. I’ve co-organized the local DevOps event coined the DevOps International Summit (DOIS) and Jenkins User Conference in Beijing, Shanghai and Shenzhen to share Agile, CI/CD, AIOps, DevOps practices and experiences in China.
I’ve joined the experts group to contribute to the DevOps Capability Maturity Model organized by CAICT, as well. Lots of companies could learn how to practice according to this model.
Not only focused on China, but also built the communication bridge with the global DevOps community and companies. For this, Kohsuke Kawaguchi and Alyssa Tong have really helped a lot.
Alan Shimel and Jayne Groll have also inspired me to introduce more experiences from China to the world and also from the world to China. So, I’m a DevOps Institute Ambassador right now. It is a great team helping to share DevOps with the world.