Skip to main content
Category

Community

Dailymotion’s Continuous Delivery story with Jenkins, Jenkins X, and Tekton

By Blog, Community

From Dailymotion, a French video-sharing technology platform with over 300 million unique monthly users

At Dailymotion, we are hosting and delivering premium video content to users all around the world. We are building a large variety of software to power this service, from our player or website to our GraphQL API or ad-tech platform. Continuous Delivery is a central practice in our organization, allowing us to push new features quickly and in an iterative way.

We are early adopters of Kubernetes: we built our own hybrid platform, hosted both on-premise and on the cloud. And we heavily rely on Jenkins to power our “release platform”, which is responsible for building, testing, packaging and deploying all our software. Because we have hundreds of repositories, we are using Jenkins Shared Libraries to keep our pipeline files as small as possible. It is an important feature for us, ensuring both a low maintenance cost and a homogeneous experience for all developers – even though they are working on projects using different technology stacks. We even built Gazr, a convention for writing Makefiles with standard targets, which is the foundation for our Jenkins Pipelines.

In 2018, we migrated our ad-tech product to Kubernetes and took the opportunity to set up a Jenkins instance in our new cluster – or better yet move to a “cloud-native” alternative. Jenkins X was released just a few months before, and it seemed like a perfect match for us:

  • It is built on top and for Kubernetes.
  • At that time – in 2018 – it was using Jenkins to run the pipelines, which was good news given our experience with Jenkins.
  • It comes with features such as preview environments which are a real benefit for us.
  • And it uses the Gitops practice, which we found very interesting because we love version control, peer review, and automation.

While adopting Jenkins X we discovered that it is first a set of good practices derived from the best performing teams, and then a set of tools to implement these practices. If you try to adopt the tools without understanding the practices, you risk fighting against the tool because it won’t fit your practices. So you should start with the practices. Jenkins X is built on top of the practices described in the Accelerate book, such as micro-services and loosely-coupled architecture, trunk-based development, feature flags, backward compatibility, continuous integration, frequent and automated releases, continuous delivery, Gitops, … Understanding these practices and their benefits is the first step. After that, you will see the limitations of your current workflow and tools. This is when you can introduce Jenkins X, its workflow and set of tools.

We’ve been using Jenkins X since the beginning of 2019 to handle all the build and delivery of our ad-tech platform, with great benefits. The main one being the improved velocity: we used to release and deploy every two weeks, at the end of each sprint. Following the adoption of Jenkins X and its set of practices, we’re now releasing between 10 and 15 times per day and deploying to production between 5 and 10 times per day. According to the State of DevOps Report for 2019, our ad-tech team jumped from the medium performers’ group to somewhere between the high and elite performers’ groups.

But these benefits did not come for free. Adopting Jenkins X early meant that we had to customize it to bypass its initial limitations, such as the ability to deploy to multiple clusters. We’ve detailed our work in a recent blog post, and we received the “Most Innovative Jenkins X Implementation” Jenkins Community Award in 2019 for it. It’s important to note that most of the issues we found have been fixed or are being fixed. The Jenkins X team has been listening to the community feedback and is really focused on improving their product. The new Jenkins X Labs is a good example.

As our usage of Jenkins X grows, we’re hitting more and more the limits of the single Jenkins instance deployed as part of Jenkins X. In a platform where every component has been developed with a cloud-native mindset, Jenkins is the only one that has been forced into an environment for which it was not built. It is still a single point of failure, with a much higher maintenance cost than the other components – mainly due to the various plugins.

In 2019, the Jenkins X team started to replace Jenkins with a combination of Prow and Tekton. Prow (or Lighthouse) is the component which handles the incoming webhook events from GitHub, and what Jenkins X calls the “ChatOps”: all the interactions between GitHub and the CI/CD platform. Tekton is a pipeline execution engine. It is a cloud-native project built on top of Kubernetes, fully leveraging the API and capabilities of this platform. No single point of failure, no plugins compatibility nightmare – yet.

Since the beginning of 2020, we’ve started an internal project to upgrade our Jenkins X setup – by introducing Prow and Tekton. We saw immediate benefits:

  • Faster scheduling of pipelines “runners” pods – because all components are now Kubernetes-native components.
  • Simpler pipelines – thanks to both the Jenkins X Pipelines YAML syntax and the ability to easily decouple a complex pipeline in multiple small ones that are run concurrently.
  • Lower maintenance cost.

While replacing the pipeline engine of Jenkins X might seem like an implementation detail, in fact, it has a big impact on the developers. Everybody is used to see the Jenkins UI as the CI/CD UI – the main entry point, the way to manually restart pipelines executions, to access logs and test results. Sure, there is a new UI and a real API with an awesome CLI, but the new UI is not finished yet, and some people still prefer to use web browsers and terminals. Leaving the Jenkins Plugins ecosystem is also a difficult decision because some projects heavily rely on a few plugins. And finally, with the introduction of Prow (Lighthouse) the Github workflow is a bit different, with Pull Requests merges being done automatically, instead of people manually merging when all the reviews and automated checks are green.

So if 2019 was the year of Jenkins X at Dailymotion, 2020 will definitely be the year of Tekton: our main release platform – used by almost all our projects except the ad-tech ones – is still powered by Jenkins, and we feel more and more its limitations in a Kubernetes world. This is why we plan to replace all our Jenkins instances with Tekton, which was truly built for Kubernetes and will enable us to scale our Continuous Delivery practices.

Comparing a Monolithic Pipeline to a Microservice Pipeline

By Blog, Community

By Tracy Ragan, CEO of DeployHub, CD Foundation Board Member

A close up of a map

Description automatically generated

Microservice pipelines are different than traditional pipelines.  As the saying goes…

“The more things change; the more things stay the same.”  

As with every step in the software development evolutionary process, our basic software practices are changing with Kubernetes and microservices.  But the basic requirements of moving software from design to release remain the same. Their look may change, but all the steps are still there. In order to adapt to a new microservices architecture, DevOps Teams simply need to understand how our underlying pipeline practices need to shift and change shape.  

Understanding Why Microservice Pipelines are Different

The key to understanding microservices is to think ‘functions.’  With a microservice environment the concept of an ‘application’ goes away. It is replaced by a grouping of loosely coupled services connected via APIs at runtime, running inside of containers, nodes and pods. The microservices are reused across teams increasing the need for improved organization (Domain Driven Design), collaboration, communication and visibility. 

The biggest change in microservice pipeline is having a single microservice used by multiple application teams independently moving through the life cycle.  Again, one must stop thinking ‘application’ and think instead think ‘functions’ to fully appreciate the oncoming shift. And remember, multiple versions of a microservice could be running in your environments at the same time.  

Microservices are immutable. You don’t ‘copy over’ the old one, you deploy a new version.  When you deploy a microservice, you create a Kubernetes deployment YAML file that defines the Label and the version of the image. 

A screenshot of a cell phone

Description automatically generated

In the above example, our Label is dh-ms-general.  When a microservice Label is reused for a new container image, Kubernetes stops using the old image. But in some cases, a second Label may be used allowing both services to be running at the same time. This is controlled by the configuration of your ingresses.  Our new pipeline process must incorporate these new features of our modern architecture. 

Comparing Monolithic to Microservice Pipelines

What does your life cycle pipeline look like when we manage small functions vs. a monolithic applications running in a modern architecture?  Below is a comparison for each category and their potential shift for supporting a microservice pipeline.

Change Request

Monolithic:

Logging a user problem ticket, enhancement request or anomaly based on an application.

Microservices:

This process will remain relatively un-changed in a microservice pipeline. Users will continue to open tickets for bugs and enhancements. The difference will be sorting out which microservice needs the update, and which version of the microservice the ticket was opened against.  Because a microservice can be used by multiple applications, dependency management and impact analysis will become more critical for helping to determine where the issue lies.   

Version Control

Monolithic:

Tracking changes in source code content.  Branching and merging updates allowing multiple developers to work on a single file. 

Microservices:

While versioning your microservice source code will still be done, your source code will be smaller, 100-300 lines of code versus 1,000 – 3,000 lines of code. This impacts the need for branching and merging.  The concept of merging ‘back to the trunk’ is more of a monolithic concept, not a microservice concept. And how often will you branch code that is a few hundred lines long?

Artifact Repository

Monolithic:

Originally built around Maven, an artifact repository provides a central location for publishing jar files, node JS Packages, Java scripts packages, docker images, python modules.  At the point in time where you run your build your package manager (maven, NPM, PIP) will perform the dependency management for tracking transitive dependencies.  

Microservices:

Again, these tools supported monolithic builds and solved dependency management to resolve compile/link steps.  We move away from monolithic builds, but we still need to build our container and resolve our dependencies. These tools will help us build containers by determining the transitive dependencies need for the container to run. 

Builds

Monolithic:

Executes a serial process for calling compilers and linkers to translate source code into binaries (Jar, War, Ear, .Exe, .dlls, docker images).  Common languages that support the build logic includes Make, Ant, Maven, Meister, NPM, PIP, and Docker Build. The build calls on artifact repositories to perform dependency management based on what versions of libraries have been specified by the build script.    

Microservices:

For the most part, builds will look very different in a microservice pipeline.  A build of a microservice will involve creating a container image and resolving the dependencies needed for the container to run.  You can think of a container image to be our new binary. This will be a relatively simple step and not involve a monolithic compile/link of an entire application. It will only involve a single microservice.  Linking is done at runtime with the restful API call coded into the microservice itself. 

Software Configuration Management (SCM)

Monolithic:

The build process is the central tool for performing configuration management. Developers setup their build scripts (POM files) to define what versions of external libraries they want to include in the compile/link process.  The build performs configuration management by pulling code from version control based on a ‘trunk’ or ‘branch. A Software Bill of Material can be created to show all artifacts that were used to create the application. 

Microservices:

Much of what we use to do for configuring our application occurred at the software ‘build.’ But ‘builds’ as we know them go away in a microservice pipeline.  This is where we made very careful decisions about what versions of source code and libraries we would use to build a version of our monolithic application. For the most part, the version and build configuration shifts to runtime with microservices.   While the container image has a configuration, the broader picture of the configuration happens at run-time in the cluster via the APIs.

In addition, our SCM will begin to bring in the concept of Domain Driven Design where you are managing an architecture based on the microservice ‘problem space.’  New tooling will enter the market to help with managing your Domains, your logical view of your application and to track versions of applications to versions of services.  In general, SCM will become more challenging as we move away from resolving all dependencies at the compile/link step and must track more of it across the pipeline. 

Continuous Integration (CI)

Monolithic:

CI is the triggered process of pulling code and libraries from version control and executing a Build based on a defined ‘quiet time.’  This process improved development by ensuring that code changes were integrated as frequently as possible to prevent broken builds, thus the term continuous integration.  

Microservices:

Continuous Integration was originally adopted to keep us re-compiling and linking our code as frequently as possible in order to prevent the build from breaking.  The goal was to get to a clean ’10-minute build’ or less. With microservices, you are only building a single ‘function.’ This means that an integration build is no longer needed.  CI will eventually go away, but the process of managing a continuous delivery pipeline will remain important with a step that creates the container. 

Code Scanning

Monolithic:

Code scanners have evolved from looking at coding techniques for memory issues and bugs to scanning for open source library usage, licenses and security problems. 

Microservices:

Code scanners will continue to be important in a microservice pipeline but will shift to scanning the container image more than the source. Some will be used during the container build focusing on scanning for open source libraries and licensing while others will focus more on security issues with scanning done at runtime. 

Continuous Testing  

Monolithic:

Continuous testing was born out of test automation tooling.  These tools allow you to perform automated test on your entire application including timings for database transactions. The goal of these tools is to improve both the quality and speed of the testing efforts driven by your CD workflow.  

Microservices:

Testing will always be an important part of the life cycle process. The difference with microservices will be understanding impact and risk levels.  Testers will need to know what applications depend on a version of a microservice and what level of testing should be done across applications. Test automation tools will need to understand microservice relationships and impact. Testing will grow beyond testing a single application and instead will shift to testing service configurations in a cluster. 

Security

Monolithic:

Security solutions allow you to define or follow a specific set of standards. They include code scanning, container scanning and monitoring. This field has grown into the DevSecOps movement where more of the security activities are being driven by Continuous Delivery. 

Microservices:

Security solutions will shift further ‘left’ adding more scanning around the creation of containers.  As containers are deployed, security tools will begin to focus on vulnerabilities in the Kubernetes infrastructure as they relate to the content of the containers. 

Continuous Delivery Orchestration (CD)

Monolithic:

Continuous Delivery is the evolution of continuous integration triggering ‘build jobs’ or ‘workflows’ based on a software application.  It auto executes workflow processes between development, testing and production orchestrating external tools to get the job done. Continuous Delivery calls on all players in the lifecycle process to execute in the correct order and centralizes their logs. 

Microservices:

Let’s start with the first and most obvious difference between a microservice pipeline and a monolithic pipeline.  Because microservices are independently deployed, most organizations moving to a microservice architecture tell us they use a single pipeline workflow for each microservice.  Also, most companies tell us that they start with 6-10 microservices and grow to 20-30 microservices per traditional application.  This means you are going to have hundreds if not thousands of workflows.  CD tools will need to include the ability to template workflows allowing a fix in a shared template to be applied to all child workflows. Managing hundreds of individual workflows is not practical. In addition, plug-ins need to be containerized and decoupled from a version of the CD tool. And finally, look for actions to be event driven, with the ability for the CD engine to listen to multiple events, run events in parallel and process thousands of microservices through the pipeline. 

Continuous Deployments

Monolithic:

This is the process of moving artifacts (binaries, containers, scripts, etc.) to the physical runtime environments on a high frequency basis.  In addition, deployment tools track where an artifact was deployed along with audit information (who, where, what) providing core data for value stream management.  Continuous deployment is also referred to as Application Release Automation. 

Microservices:

The concept of deploying an entire application will simply go away. Instead, deployments will be a mix of tracking the Kubernetes deployment YAML file with the ability to manage the application’s configuration each time a new microservice is introduced to the cluster.  What will become important is the ability to track the ‘logical’ view of an application by associating which versions of the microservices make up an application.  This is a big shift. Deployment tools will begin generating the Kubernetes YAML file removing it from the developer’s to-do list.  Deployment tools will automate the tracking of versions of the microservice source to the container image to the cluster and associated applications to provide the required value stream reporting and management.  

Conclusion

As we shift from managing monolithic applications to microservices, we will need to create a new microservice pipeline.  From the need to manage hundreds of workflows in our CD pipeline, to the need for versioning microservices and their consuming application versions, much will be different.  While there are changes, the core competencies we have defined in traditional CD will remain important even if it is just a simple function that we are now pushing independently across the pipeline.

About the Author

A person who is smiling and looking at the camera

Description automatically generated

Tracy Ragan is CEO of DeployHub and serves on the Continuous Delivery Foundation Board. She is a microservice evangelist with expertise in software configuration management, builds and release. Tracy was a consultant to Wall Street firms on build and release management for 7 years prior to co-founding OpenMake Software in 1995. She was a founding member of the Eclipse organization and served on the board for 5 years.   She is a recognized leader and has been published in multiple industry publications as well as presenting to audiences at industry conferences. Tracy co-founded DeployHub in 2018 to serve the microservice development community.  

Descartes Labs – Implementation of Spinnaker Pipelines – The End of Waterfall

By Blog, Community

By Tracy Ragan, CEO DeployHub, CD Foundation Board Member

The New Mexico CI/CD CDF Meetup, hosted by DeployHub, enjoyed an amazing presentation by Louis Vernon, Site Reliability Engineer at Descartes Labs.  Louis showed in detail how Descartes Labs improved service levels to customers, dumped a waterfall release approach and simplified their GKE releases using Spinnaker, Istio and StackDriver. 

Louis’ presentation covers how the team at Descartes Lab implemented Spinnaker to push continuous deployments including the integration of Istio to route updates between Dev, Beta, Pre-Release, and Release, all running in the same cluster.

The use of both Istio and Spinnaker at Descartes Labs is a mature example of what can be done to build out a modern Kubernetes Pipeline.

While Descartes Labs still implements different ‘states’ of the pipeline, the release process uses a single cluster with Istio properly routing using virtual service names.

Louis explains how the team at Descartes Labs got to the point where they understood that a shift of this magnitude was essential for creating a stable environment for all users. 

In my humble opinion, moving away from separate Dev, Test and Prod clusters is the direction we will all be moving. 


The full recorded demo is here:

Louis also presented at the Spinnaker Summit 2019. You can download his presentation at:

https://github.com/louisvernon/SpinnakerSummit2019

Getting Serious About Open Source Security

By Blog, Community

A not so serious look at a very serious problem

Originally published on Medium by Dan Lorenc, (dlorenc@google.com,
twitter.com/lorenc_dan)

Photo by Matthew Henry on Unsplash

A Blast From the Past

2019 was a crazy time to be writing software. It’s hard to believe how careless we were as an industry. Everyone was just having fun slinging code. Companies were using whatever code they found laying around on NPM, Pip, or Maven Central. No one even looked at the code these package managers were downloading for them. We had no idea where these binaries came from or even who wrote most of this stuff.

And don’t even get me started on containers! There was no way to know what was inside most of them or what they did. Yet there we were, pulling them from Dockerhub, slapping some YAML on them, and running them as root in our Kubernetes clusters. Whoops, I just dated myself. Kubernetes was a primitive system written mostly in YAML and Bash that people used to interact with before Serverless came and saved us all.

Looking back, it’s shocking that the industry is still around! How we didn’t have to cough up every Bitcoin in the world to stop our databases from getting leaked or our servers from being blown up is beyond me. Thankfully, we realized how silly this all was, and we stopped using whatever code had the most Github stars and started using protection.


We’re Under Attack

No, really. Every time you pip installgo get, or mvn fetch something, you’re doing the equivalent of plugging a thumb drive you found on the sidewalk into your production server.

You’re taking code from someone you’ve never met and then running it with access to your most sensitive data. Hopefully, you at least know their email address or Github account from the commit, but there’s no way to know if this is accurate unless you’re checking PGP signatures. And let’s be honest, you’re probably not doing that.

This might sound like I’m just fear-mongering, but I promise I’m not. This is a real problem that everyone needs to be aware of. Attacks like this are called supply-chain attacks, and they are nothing new. Just last month, an active RCE vulnerability was found in an open source package on PyPi that was being used to steal SSH and GPG credentials.

There are lots of variations on this same play that make use of different social-engineering techniques in interesting ways. One attacker used a targeted version of this to steal cryptocurrency from a few specific websites. Another group performed a “long-con” where they actually produced and maintained a whole set of useful open source images on Dockerhub for years before slowly adding, you guessed it, crypto-mining.

The possibilities are endless, terrifying, and morbidly fascinating. And they’re happening more and more often. If reading about attacks like these is your kind of thing, the CNCF has started cataloging known instances of themSnyk also just published a post detailing how easy it is to inject code like this in most major languages — Github even hides these diffs in code review by default! Russ Cox has also been writing about this problem for a while.


Vision

OK, there’s a bit of hyperbole up there (Kubernetes doesn’t have that much bash in it), but open source is under attack, and it’s not OK. Some progress is being made in this area — GitHub and others are scanning repositories, binaries, and containers, but these tools all only work on known vulnerabilities. They have no mechanism to handle intentional, malicious ones before they are discovered, which are at least as dangerous.

The brutal fact is that there is no way to be confident about the code you find on most artifact repositories today. The service might be compromised and serve you a different package from the one the author uploaded. The maintainer’s credentials might have been compromised, allowing an attacker to upload something malicious. The compiler itself might have been hacked, or even the compiler that compiler used (PDF warning)! Or, the maintainer could have just snuck something in on purpose.

For any given open source package, we need to be able to confidently assert what code it’s comprised of, what toolchains and steps were used to produce the package, and who was responsible for each piece. This information needs to be made available publicly. A reliable, secure view of the supply-chain of every open source package will help make these attacks easier to prevent and easier to detect when they do happen. And the ability to tie each line of code and action back to a real individual will allow us to hold attackers accountable.


How Do We Get There?

We need to work as an industry to start securing open source software, piece by piece.

Artifact repositories need to support basic authentication best practices like 2FA, artifact signing, and strong password requirements. DockerHub, PyPi, and NPM support 2FA, but there’s no way to see if a maintainer of a package is using it. Most container registries don’t support signatures yet, though work is ongoing.

Software build systems need to make reproducible, hermetic builds possible and easy. Debian has started doing some great work here, but they’re basically alone. Every docker build gives you a new container digest. Tar and gzip throw timestamps everywhere. It’s possible to get reproducible builds in GoJava, and most other major languages, but it’s not necessarily easy. See the recently published whitepaper on how Google handles much of this internally for more information.

SCM providers need strong identity mechanisms so we can associate code back to authors confidently. Git commit logs can be easily forged, and signed commits are not in common use. Even with them, you still have no idea who is on the other end of a PR, only that the signature matches. This isn’t just an issue for security. It can also be a licensing nightmare if you don’t know the real author or license of code you’re accepting.

There is value in allowing developers to work anonymously, but there is also a cost. We need to balance this with systems that apply a higher level of scrutiny to anonymous code. We also need to allow other individuals to “vouch for” patches that they’ve examined, maybe similar to how Wikipedia handles anonymous edits.

And finally, all of this needs to be tied together in secure CI/CD systems and platforms that implement binary transparency for public packages. Putting the packaging steps in the hands and laptops of developers leaves way too large an attack surface. The ability to push a package that will run in prod is the same as having root in prod. By moving the build and upload steps into secure CI/CD systems, we can reduce the need to trust individuals.


OK, but What Can I Do Now?

First, start by securing your code as much as possible. Make sure you have copies of every dependency you’re using stored somewhere. Make sure you review all code you’re using, including OSS. Set up and mandate the use of 2FA across your organization. Publish, and actually check the signatures and digests of the software you’re using.

Log enough information in your build system so you can trace back every artifact to the sources. And every deployment to the artifacts. Once you’ve done all of this, you’ll be pretty far ahead of everyone else. You’re not completely safe, though.

That’s where we need to work together. If you’re interested in helping out, there are many ways to get involved, and I’m sure there are a lot of efforts going on. We’re just getting started on several initiatives inside the Continuous Delivery Foundation, like our new Security SIG. We’re also hoping to make it easier to build and use secure delivery pipelines inside the TektonCD open source project.

We would love your help, no matter your expertise! For example, I’m far from a security expert, but I’ve spent a lot of time working on developer tools and CI/CD systems. Feel free to reach out to me directly if you have any questions or want to get involved. I’m on Twitter and Github.

How do we measure the growth of CD Foundation projects and their communities?

By Blog, Community

Written by Tracy Miranda, CloudBees director of open source community and member of the CDF governing board

The CD Foundation (CDF) recently shared its 9 Strategic Goals. The second on the list is “Cultivate Growth of Projects.” This goal naturally leads us to ask ourselves the question: how do we measure the growth of our projects to know we are being successful?

There are many dimensions to open source projects. In order to sustain a project much more than code is required. The CDF helps with multiple essential services for project growth and sustenance. One of my favourite services is the CDF devstats site, which provides a wealth of data around the projects. CDF devstats, which is based on the CNCF devstats, gives indicators on community health and contributor statistics. 

Example from Tekton, one of many dashboards and data sets available:

Sometimes we have distorted views of how well projects are doing – this can be down to a few things such as hype or public sentiment around a project. Sometimes newer projects are viewed as doing better than older projects. It is important to have a sense of how well your project is doing. While there are lots of different ways to do it, one method I really like is looking at the number of individual developers contributing to a project. 

With CDF devstats I am able to take a snapshot of that data, and then see how the CDF projects stacked up against CNCF graduated projects. 

The chart here shows a visualization of average developer contributions to each project based on data from the past one year. There are many caveats with the data. E.g. Which repos are included for each project may not be strictly equivalent. But what I like about it is that at a high level it gives an indication of the size of project contributions and how projects compare relatively. 

I also like that it is all open – so you can verify the data and process  for yourself, plus do your own analysis. 

Kubernetes, as you might expect,  is a powerhouse of a project with thousands of contributors. But actually Jenkins stacks up nicely in comparison with a healthy number of contributions – which is all the more significant considering it is a 15 year old project. Sustaining and growing community contributions year-on-year for 15 years is an incredible achievement. The other CDF projects, Spinnaker, Jenkins X and Tekton, are much newer but also coming along quite nicely. See this repo for links to data. 

For me this is a nice snapshot to say we’re off to a good start here at CDF. Individual project growth will come down to each project’s community – but CDF will be working to provide key services and some of the less fun grunt stuff so project leaders can better focus on the important efforts of community building.