We have recently made changes to the underlying Screwdriver Architecture for build processing. Previously, the executor-queue was tightly-coupled to the SD API and worked by constantly polling for messages at specific intervals. Due to this design, the queue would block API requests. Furthermore, if the API crashed, scheduled jobs might not be added to the queue, causing cascading failures.
Hence, keeping the principles of isolation-of-concerns and abstraction in mind, we designed a more resilient REST-API-based queueing system: the Queue Service. This new service reads, writes and deletes messages from the queue after processing. It also encompasses the former capability of the queue-worker and acts as a scheduler.
The SD API and Queue Service communicate bidirectionally using signed JWT tokens sent via auth headers of each request.
As a cluster admin, to configure using the queue as an executor, you can deploy the queue-service as a REST API using a screwdriver.yaml and update configuration in SD API to point to the new service endpoint:
Enhancement: Can override Freeze Window to start a build.
Previously, users could not start builds during a freeze window unless they made changes to the freeze window setting in the screwdriver.yaml configuration. Now, you can start a build by entering a reason in the confirmation modal. This can be useful for users needing to push out an urgent patch or hotfix during a freeze window.
Feature: Build cache now supports local disk-based cache in addition to S3 cache.
Bugfix: Periodic build timeout check
Enhancement: Prevent re-enqueue of builds from same event.
In order to have these improvements, you will need these minimum versions:
UI – v1.0.479
API – v0.5.835
Store – v3.10.3
Launcher – v6.0.42
Queue-Worker – v2.9.0
Thanks to the following contributors for making this feature possible:
Screwdriver now has the ability to cache and restore files and directories from your builds to either s3 or disk-based storage. Rest all features related to the cache feature remains the same, only a new storage option is added. Please DO NOT USE this cache feature to store any SENSITIVE data or information.
The graph below is our Internal Screwdriver instance build-cache comparison between disk-based strategy vs aws s3.
Build cache – get cache – (disk strategy)
Build cache – get cache – (s3)
Build cache – set cache – (disk strategy)
Build cache – set cache – (s3)
Why disk-based strategy?
Based on the cache analysis, 1. The majority of time was spent pushing data from build to s3, 2. At times the cache push fails if the cache size is big (ex: >1gb). So, simplified the storage part by using a disk cache strategy and using filer/storage mount as a disk option. Each cluster will have its own filer/storage disk mount.
NOTE: When a cluster becomes unavailable and if the requested cache is not available in the new cluster, the cache will be rebuilt once as part of the build.
Max size limit per cache is configurable by Cluster admins.
Cluster admins are responsible to enforce retention policy.
Screwdriver cluster-admin has the ability to specify the cache storage strategy along with other options like compression, md5 check, cache max limit in MB
Screwdriver is an open-source build automation platform designed for Continuous Delivery. It is built (and used) by Yahoo. Don’t hesitate to reach out if you have questions or would like to contribute: http://docs.screwdriver.cd/about/support.
The Spinnaker community has grown significantly after launching as an open source project in 2015. The project maintainers increasingly look for ways to help the community better understand how Spinnaker is used, and to help contributors prioritize future improvements.
Today, feature development is guided by industry experts, community discussions, Special Interest Groups (SIGs), and events like the recently held Spinnaker Summit. In August 2019, the community published an RFC, which proposed the tooling that will enable everyone to make data-driven decisions based on product usage across all platforms. We encourage Spinnaker users to continue providing feedback, and to review and comment on the RFC.
Following on from this RFC, the Spinnaker 1.18 release includes an initial implementation of statistics collection capabilities that are used to collect generic deployment and usage information from Spinnaker installations around the world. Before going into the details, here are some important facts to know:
No personally identifying information (PII) is collected or logged.
All stats collection code is open source and can be found in the Spinnaker stats, Echo, and Kork repos found on GitHub.
Users can disable statistics collection at any time through a single Halyard command.
Community members that want to work with the underlying dataset and/or dashboard reports can request and receive full access.
This feature exists in the Spinnaker 1.18 release,but is disabled by default while we finalize testing of the backend and fine-tune report dashboards. The feature will be enabled by default in the Spinnaker 1.19 release (scheduled for March 2020).
All data will be stored in a Google BigQuery database, and report dashboards will be publicly available from the Community Stats page. Community members can request access to the collection data.
Data collected as part of this effort allows the entire community to better monitor the growth of Spinnaker, understand how Spinnaker is used “in the wild”, and prioritize feature development across a large community of Spinnaker contributors. Thank you for supporting Spinnaker and for your help in continuing to make Spinnaker better!
Tekton Pipelines, the core component of the Tekton project, is moving to beta status with the release of v0.11.0 this week. Tekton is an open source project creating a cloud-native framework you can use to configure and run continuous integration and continuous delivery (CI/CD) pipelines within a Kubernetes cluster.
Tekton development began as Knative Build before becoming a founding project of the CD Foundation under the Linux Foundation last year.
The Tekton project follows the Kubernetes deprecation policies. With Tekton Pipelines upgrading to beta, most Tekton Pipelines CRDs (Custom Resource Definition) are now at beta level. This means overall beta level stability can be relied on. Please note, Tekton Triggers, Tekton Dashboard, Tekton Pipelines CLI and other components are still alpha and may continue to evolve from release to release.
Tekton encourages all Tekton projects and users to migrate their integrations to the new apiVersion. Users of Tekton can see the migration guide on how to migrate from v1alpha1 to v1beta1.
Now is a great time to contribute. There are many areas where you can jump in. For example, the Tekton Task Catalog allows you to share and reuse the components that make up your Pipeline. You can set a Cluster scope, and make tasks available to all users in a namespace, or you can set a Namespace scope, and make it usable only within a specific namespace.
Continuous Delivery (CD) and Runbook Automation are standard means to deploy, operate and manage software artifacts across the software life cycle. Based on our analysis of many delivery pipeline implementations, we have seen that on average seven or more tools are included in these processes, e.g., version control, build management, issue tracking, testing, monitoring, deployment automation, artifact management, incident management, or team communication. Most often, these tools are “glued together” using custom, ad-hoc integrations in order to form a full end-to-end workflow. Unfortunately, these custom ad-hoc tool integrations also exist in Runbook Automation processes.
Problem: Point-to-Point Integrations are Hard to Scale and Maintain
Not only is this approach error-prone but maintenance and troubleshooting of these integrations in all its permutations is time-intensive too. There are several factors that prevent organizations from scaling this across multiple teams:
Number of tools: Although the great availability of different tools always allows having the appropriate tool in place, the numberof required integrations explodes.
Tight coupling: The tool integrations are usually implemented within the pipeline, which results in a tight coupling between the pipeline and the tool.
Copy-paste pipeline programming: A common approach we are frequently seeing is that a pipeline with a working tool integration is often used as a starting point for new pipelines. If now the API of a used tool changes, all pipelines have to catch up to stay compatible and to prevent vulnerabilities.
Let’s imagine an organization with hundreds of copy-paste pipelines, which all contain a hard-coded piece of code for triggering Hey load tests. Now this organization would like to switch from Hey to JMeter. Therefore, they would have to change all their pipelines. This is clearly not efficient!
In order to solve these challenges, we propose introducing interoperability interfaces, which allow abstract tooling in CD and Runbook Automation processes. These interfaces should trigger operations in a tool-agnostic way.
For example, a test interface could abstract different testing tools. This interface can then be used within a pipeline to trigger a test without knowing which tool is executing the actual test in the background.
These interoperability interfaces are important and this is confirmed by the fact that the Continuous Delivery Foundation has implemented a dedicated working group on Interoperability, as well as the open-source project Eiffel, which provides an event-based protocol enabling a technology-agnostic communication especially for Continuous Integration tasks.
Use Events as Interoperability Interfaces
By implementing these interoperability interfaces, we define a standardized set of events. These events are based on CloudEvents and allow us to describe event data in a common way.
The first goal of our standardization efforts is to define a common set of CD and runbook automation operations. We identified the following common operations (please let us know if we are missing important operations!):
Operations in CD processes: deployment, test, evaluation, release, rollback
Operations in Runbook Automation processes: problem analysis, execution of the remediation action, evaluation, and escalation/resolution notification
For each of these operations, an interface is required, which abstracts the tooling executing the operation. When using events, each interface can be modeled as a dedicated event type.
The second goal is to standardize the data within the event, which is needed by the tools in order to trigger the respective operation. For example, a deployment tool would need the information of the artifact to be deployed in the event. Therefore, the event can either contain the required resources (e.g. a Helm chart for k8s) or a URI to these resources.
We already defined a first set of events https://github.com/keptn/spec, which is specifically designed for Keptn — an open-source project implementing a control plane for continuous delivery and automated operations. We know that these events are currently too tailored for Keptn and single tools. So, please
Let us Work Together on Standardizing Interoperability Interfaces
In order to work on a standardized set of events, we would like to ask you to join us in Keptn Slack.
We can use the #keptn-spec channel in order to work on standardizing interoperability interfaces, which eventually are directly interpreted by tools and will make custom tool integrations obsolete!
Originally posted on the Armory blog, by Rosalind Benoit
“Let Google’s CloudBuild handle building, testing, and pushing the artifact to your repository. #WithSpinnaker, you can go as fast as you want, whenever you’re ready.”
Calling all infrastructure nerds, SREs, platforms engineers, and the like: if you’ve never seen Kelsey Hightower speak in person, add it to your bucket list. Last week, he gave a talk at Portland’s first Spinnaker meetup, hosted at New Relic by the amazing PDX DevOps GroundUp. I cackled and cried at the world’s most poignant ‘Ops standup’ routine. Of course, he thrilled the Armory tribe with his praise of Spinnaker’s “decoupling of high level primitives,” and I can share some key benefits that Kelsey highlighted:
Even with many different build systems, you can consolidate deployments #withSpinnaker. Each can notify Spinnaker to pick up new artifacts as they are ready.
Spinnaker’s application-centric approach helps achieve continuous delivery buy-in. It gives application owners the control they crave, within automated guardrails that serialize your software delivery culture.
Building manual judgements into heavy deployment automation is a “holy grail” for some. #WithSpinnaker, we can end the fallacy of “just check in code and everything goes to prod.” We can codify the steps in between as part of the pipeline.
Spinnaker uses the perfect integration point. It removes the brittleness of scripting out the transition between a ‘ready-to-rock’ artifact and an application running in production.
Kelsey’s words have profound impact. He did give some practical advice, like “Don’t run Spinnaker on the same cluster you’re deploying to,” and of course, keep separate build and deploy target environments. But the way Kelsey talked about culture struck a chord. We called the meetup, “Serializing culture into continuous delivery,” and in his story, Kelsey explained that culture is what you do: the actions you take as you work; your steps in approaching problems.
I’m reminded of working on a team struggling with an “agile transformation” through a series of long, circular discussions. I urged my team, “Scrum is just something that you do!” You go to standups, and do demos. You get better at pointing work over time. The ceremonies matter because you adapt by doing the work.
Kelsey says his doing approach starts with raising his hand and saying, “I would like to own that particular problem,” and then figuring it out as he goes. Really owning a problem requires jumping in to achieve a deep understanding of it. It’s living it, and sharing with others who have lived it. We can BE our culture by learning processes hands-on, digging into the business reasons behind constraints, and using that knowledge to take ownership. Hiding behind culture talk doesn’t cut it, since you have to do it before you can change it.
“The return on investment is totally worth it”
Another important way of doing: recognizing when you don’t know how to do it and need some help. Powerful open source projects like Kubernetes and Spinnaker can become incredibly complicated to implement in a way that faithfully serializes your culture. Responsible ownership means getting the help you need to execute.
I love how Kelsey juxtaposed the theatrics and hero mythology behind change management and outage “war rooms” with the stark truth of the business needs behind our vital services. As Kelsey shared his Ops origins story, I recalled my own – the rocket launch music that played in my head the first time I successfully restarted the java process for an LMS I held the pager for, contrasted with the sick feeling I got when reading the complaining tweets from university students who relied on the system and had their costly education disrupted by the outage. I knew the vast majority of our students worked full time and paid their own way, and that many had families to juggle as I do. This was the real story of our work. It drove home the importance of continuous improvement, and meant that our slow-moving software delivery culture frustrated the heck out of me.
Kelsey’s LOL simulation of the Word doc deployment guide at his first “real” job. Got a deployment horror story about a Word-copied command with an auto-replaced en-dash on a flag not triggered until after database modification scripts had already run? I do!
So what do you do if you’re Kelsey? You become an expert at serializing a company’s decisions around software delivery and telling them, as a quietly functioning story, with the best-in-class open source and Google tooling of the moment. He tells the story of his first automation journey: “So I started to serialize culture,” he says, when most of the IT department left him to fend for himself over the winter holidays. Without trying to refactor applications, he set to work codifying the software delivery practices he had come to understand through Ops. He automated processes, using tools his team felt comfortable with.
He said, “We never walked around talking about all of our automation tools,” and that’s not a secrecy move, it’s his awareness of cognitive dissonance and cognitive overload. Because he had created a system based on application owners expectations, their comfort zone, he didn’t need to talk about it! It just worked (better and more efficiently over time, of course), and fulfilled the business case. Like Agile, this approach limits the scope of what one has to wrap their brain around to be excellent. Like Spinnaker, it empowers developers to focus on what they do best.
Instead of talking about the transformation you need, start by starting. Then change will begin.
By Forest Jing, Jenkins Ambassador and JAM organizer in China
On February 29, 2020, the first CI/CD Meetup in China was successfully held online. The atmosphere of this online live streaming event was hot and welcomed. There were more than 5,000 people and 27,000 pageviews! Several CI/CD experts have shared the practices about CI, CD, and DevOps. Although affected by the COVID-19, but it could not stop everyone’s passion of learning.
CI/CD Meetup is a global community event hosted by the Continuous Delivery Foundation (CDF), which aims to build a CI/CD ecosystem and promote CI/CD related practices and open source projects. The CI/CD Meetup in China is co-organized by Jenkins Ambassador Shi Xuefeng, Lei Tao, and Jing Yun who are also organizers of Jenkins Area Meetup in China. And DevOps Times community and GreatOps community are co-organizer of the event. We hope we could introduce CI/CD to more Chinese IT companies to improve their IT performance.
Everyone likes the content and is curious to ask the lecturers: “As a programmer, why do you all have so luxuriant hairs?”
Details from the live broadcast content
Topic 1· CI/CD Practice of Large Mobile App
Shi Xuefeng, Engineering Efficiency Director of JD.COM,Jenkins Ambassador and Core author of DevOps Capability Maturity Model
First of all, Shi Xuefeng brought the wonderful topic of “Large Mobile App CI/CD.”
In the mobile era, mobile applications have become the main battlefield of business. In this activity, Xuefeng shared how is the CI/CD of a super large app is designed and implemented.
Topic 2· The implementation and practice of Agile && DevOps at CITIC Bank
Shi Lilong,Senior Expert, Software Development Center, CITIC Bank
Subsequently, Shi Lilong, a senior expert at the software development center of CITIC Bank, brought a wonderful sharing of “the implementation and practice of Agile and DevOps in CITIC Bank”.
Mr. Shi Lilong shared the overall promotion of CITIC Bank in Agile and DevOps, and the end-to-end tool chain of CITIC Bank.
Topic 3: How do large-scale financial and Internet companies conduct product library management?
Wang Qing,JFrog Chief Architect in China
Wang Qing, Chief Architect of JFrog China, brought a wonderful sharing of “How do large financial and Internet companies manage product libraries?”
Due to the large number of R & D personnel and large types of products delivered by large financial companies and Internet companies, the application dependent libraries and product libraries have become complicated and difficult to manage. After the implementation of many enterprise-level user product libraries, the advanced functions of the work-in-progress library solve the above problems and open up the second pulse of continuous delivery.
Topic 4: Watch out! 10 obstacles in DevOps Transformation
Shi Jingfeng, Senior DevOps expert in GreatOPS Community
Mr. Shi Jingfeng brought a wonderful sharing of “Watch out! 10 obstacles in DevOps Transformation.”
During the these days, many companies have started to work from home. Various obstacles appeared on the first day of WFH. The conference system was unstable, VPN connection was not available, remote desktops were queued, and the phone was busy. The implementation of DevOps seemed make all of these very easy . Jingfeng thinks that DevOps is like a journey, there are both beautiful attractions and obstacles. It is difficult to save yourself by not paying attention to the obstacles? How these pain points are addressed based on the DevOps Capability Maturity Model.
The last topic is a CI/CD expert question and answer part. All experts will answer the questions raised.
Finally, the last group photo of the experts, the CI/CD Meetup online salon was successfully held.
This event was co-sponsored by the CDF, DevOps Times community, and GreatOPS community. Thanks to the strong support of JFrog and Tencent Cloud Community.
The last story of the first CI/CD Meetup in China.
Shi Xuefeng (BC), Lei Tao and Forest Jing are the Jenkins Ambassador who are always organizing JAM in China. We all visited DevOps World Lisbon. At the event, we met Kohsuke Kawaguchi and Alyssa Tong. So we discussed to introduce CI/CD Meetup into China. It is a fantastic event.
From Dailymotion, a French video-sharing technology platform with over 300 million unique monthly users
At Dailymotion, we are hosting and delivering premium video content to users all around the world. We are building a large variety of software to power this service, from our player or website to our GraphQL API or ad-tech platform. Continuous Delivery is a central practice in our organization, allowing us to push new features quickly and in an iterative way.
We are early adopters of Kubernetes: we built our own hybrid platform, hosted both on-premise and on the cloud. And we heavily rely on Jenkins to power our “release platform”, which is responsible for building, testing, packaging and deploying all our software. Because we have hundreds of repositories, we are using Jenkins Shared Libraries to keep our pipeline files as small as possible. It is an important feature for us, ensuring both a low maintenance cost and a homogeneous experience for all developers – even though they are working on projects using different technology stacks. We even built Gazr, a convention for writing Makefiles with standard targets, which is the foundation for our Jenkins Pipelines.
In 2018, we migrated our ad-tech product to Kubernetes and took the opportunity to set up a Jenkins instance in our new cluster – or better yet move to a “cloud-native” alternative. Jenkins X was released just a few months before, and it seemed like a perfect match for us:
It is built on top and for Kubernetes.
At that time – in 2018 – it was using Jenkins to run the pipelines, which was good news given our experience with Jenkins.
It comes with features such as preview environments which are a real benefit for us.
And it uses the Gitops practice, which we found very interesting because we love version control, peer review, and automation.
While adopting Jenkins X we discovered that it is first a set of good practices derived from the best performing teams, and then a set of tools to implement these practices. If you try to adopt the tools without understanding the practices, you risk fighting against the tool because it won’t fit your practices. So you should start with the practices. Jenkins X is built on top of the practices described in the Accelerate book, such as micro-services and loosely-coupled architecture, trunk-based development, feature flags, backward compatibility, continuous integration, frequent and automated releases, continuous delivery, Gitops, … Understanding these practices and their benefits is the first step. After that, you will see the limitations of your current workflow and tools. This is when you can introduce Jenkins X, its workflow and set of tools.
We’ve been using Jenkins X since the beginning of 2019 to handle all the build and delivery of our ad-tech platform, with great benefits. The main one being the improved velocity: we used to release and deploy every two weeks, at the end of each sprint. Following the adoption of Jenkins X and its set of practices, we’re now releasing between 10 and 15 times per day and deploying to production between 5 and 10 times per day. According to the State of DevOps Report for 2019, our ad-tech team jumped from the medium performers’ group to somewhere between the high and elite performers’ groups.
But these benefits did not come for free. Adopting Jenkins X early meant that we had to customize it to bypass its initial limitations, such as the ability to deploy to multiple clusters. We’ve detailed our work in a recent blog post, and we received the “Most Innovative Jenkins X Implementation” Jenkins Community Award in 2019 for it. It’s important to note that most of the issues we found have been fixed or are being fixed. The Jenkins X team has been listening to the community feedback and is really focused on improving their product. The new Jenkins X Labs is a good example.
As our usage of Jenkins X grows, we’re hitting more and more the limits of the single Jenkins instance deployed as part of Jenkins X. In a platform where every component has been developed with a cloud-native mindset, Jenkins is the only one that has been forced into an environment for which it was not built. It is still a single point of failure, with a much higher maintenance cost than the other components – mainly due to the various plugins.
In 2019, the Jenkins X team started to replace Jenkins with a combination of Prow and Tekton. Prow (or Lighthouse) is the component which handles the incoming webhook events from GitHub, and what Jenkins X calls the “ChatOps”: all the interactions between GitHub and the CI/CD platform. Tekton is a pipeline execution engine. It is a cloud-native project built on top of Kubernetes, fully leveraging the API and capabilities of this platform. No single point of failure, no plugins compatibility nightmare – yet.
Since the beginning of 2020, we’ve started an internal project to upgrade our Jenkins X setup – by introducing Prow and Tekton. We saw immediate benefits:
Faster scheduling of pipelines “runners” pods – because all components are now Kubernetes-native components.
Simpler pipelines – thanks to both the Jenkins X Pipelines YAML syntax and the ability to easily decouple a complex pipeline in multiple small ones that are run concurrently.
Lower maintenance cost.
While replacing the pipeline engine of Jenkins X might seem like an implementation detail, in fact, it has a big impact on the developers. Everybody is used to see the Jenkins UI as the CI/CD UI – the main entry point, the way to manually restart pipelines executions, to access logs and test results. Sure, there is a new UI and a real API with an awesome CLI, but the new UI is not finished yet, and some people still prefer to use web browsers and terminals. Leaving the Jenkins Plugins ecosystem is also a difficult decision because some projects heavily rely on a few plugins. And finally, with the introduction of Prow (Lighthouse) the Github workflow is a bit different, with Pull Requests merges being done automatically, instead of people manually merging when all the reviews and automated checks are green.
So if 2019 was the year of Jenkins X at Dailymotion, 2020 will definitely be the year of Tekton: our main release platform – used by almost all our projects except the ad-tech ones – is still powered by Jenkins, and we feel more and more its limitations in a Kubernetes world. This is why we plan to replace all our Jenkins instances with Tekton, which was truly built for Kubernetes and will enable us to scale our Continuous Delivery practices.