Originally posted on the Armory blog, by Rosalind Benoit
“Let Google’s CloudBuild handle building, testing, and pushing the artifact to your repository. #WithSpinnaker, you can go as fast as you want, whenever you’re ready.”
Calling all infrastructure nerds, SREs, platforms engineers, and the like: if you’ve never seen Kelsey Hightower speak in person, add it to your bucket list. Last week, he gave a talk at Portland’s first Spinnaker meetup, hosted at New Relic by the amazing PDX DevOps GroundUp. I cackled and cried at the world’s most poignant ‘Ops standup’ routine. Of course, he thrilled the Armory tribe with his praise of Spinnaker’s “decoupling of high level primitives,” and I can share some key benefits that Kelsey highlighted:
Even with many different build systems, you can consolidate deployments #withSpinnaker. Each can notify Spinnaker to pick up new artifacts as they are ready.
Spinnaker’s application-centric approach helps achieve continuous delivery buy-in. It gives application owners the control they crave, within automated guardrails that serialize your software delivery culture.
Building manual judgements into heavy deployment automation is a “holy grail” for some. #WithSpinnaker, we can end the fallacy of “just check in code and everything goes to prod.” We can codify the steps in between as part of the pipeline.
Spinnaker uses the perfect integration point. It removes the brittleness of scripting out the transition between a ‘ready-to-rock’ artifact and an application running in production.
Kelsey’s words have profound impact. He did give some practical advice, like “Don’t run Spinnaker on the same cluster you’re deploying to,” and of course, keep separate build and deploy target environments. But the way Kelsey talked about culture struck a chord. We called the meetup, “Serializing culture into continuous delivery,” and in his story, Kelsey explained that culture is what you do: the actions you take as you work; your steps in approaching problems.
Yes, please!
I’m reminded of working on a team struggling with an “agile transformation” through a series of long, circular discussions. I urged my team, “Scrum is just something that you do!” You go to standups, and do demos. You get better at pointing work over time. The ceremonies matter because you adapt by doing the work.
Kelsey says his doing approach starts with raising his hand and saying, “I would like to own that particular problem,” and then figuring it out as he goes. Really owning a problem requires jumping in to achieve a deep understanding of it. It’s living it, and sharing with others who have lived it. We can BE our culture by learning processes hands-on, digging into the business reasons behind constraints, and using that knowledge to take ownership. Hiding behind culture talk doesn’t cut it, since you have to do it before you can change it.
“The return on investment is totally worth it”
Another important way of doing: recognizing when you don’t know how to do it and need some help. Powerful open source projects like Kubernetes and Spinnaker can become incredibly complicated to implement in a way that faithfully serializes your culture. Responsible ownership means getting the help you need to execute.
I love how Kelsey juxtaposed the theatrics and hero mythology behind change management and outage “war rooms” with the stark truth of the business needs behind our vital services. As Kelsey shared his Ops origins story, I recalled my own – the rocket launch music that played in my head the first time I successfully restarted the java process for an LMS I held the pager for, contrasted with the sick feeling I got when reading the complaining tweets from university students who relied on the system and had their costly education disrupted by the outage. I knew the vast majority of our students worked full time and paid their own way, and that many had families to juggle as I do. This was the real story of our work. It drove home the importance of continuous improvement, and meant that our slow-moving software delivery culture frustrated the heck out of me.
Kelsey’s LOL simulation of the Word doc deployment guide at his first “real” job. Got a deployment horror story about a Word-copied command with an auto-replaced en-dash on a flag not triggered until after database modification scripts had already run? I do!
So what do you do if you’re Kelsey? You become an expert at serializing a company’s decisions around software delivery and telling them, as a quietly functioning story, with the best-in-class open source and Google tooling of the moment. He tells the story of his first automation journey: “So I started to serialize culture,” he says, when most of the IT department left him to fend for himself over the winter holidays. Without trying to refactor applications, he set to work codifying the software delivery practices he had come to understand through Ops. He automated processes, using tools his team felt comfortable with.
He said, “We never walked around talking about all of our automation tools,” and that’s not a secrecy move, it’s his awareness of cognitive dissonance and cognitive overload. Because he had created a system based on application owners expectations, their comfort zone, he didn’t need to talk about it! It just worked (better and more efficiently over time, of course), and fulfilled the business case. Like Agile, this approach limits the scope of what one has to wrap their brain around to be excellent. Like Spinnaker, it empowers developers to focus on what they do best.
Instead of talking about the transformation you need, start by starting. Then change will begin.
By Forest Jing, Jenkins Ambassador and JAM organizer in China
On February 29, 2020, the first CI/CD Meetup in China was successfully held online. The atmosphere of this online live streaming event was hot and welcomed. There were more than 5,000 people and 27,000 pageviews! Several CI/CD experts have shared the practices about CI, CD, and DevOps. Although affected by the COVID-19, but it could not stop everyone’s passion of learning.
CI/CD Meetup is a global community event hosted by the Continuous Delivery Foundation (CDF), which aims to build a CI/CD ecosystem and promote CI/CD related practices and open source projects. The CI/CD Meetup in China is co-organized by Jenkins Ambassador Shi Xuefeng, Lei Tao, and Jing Yun who are also organizers of Jenkins Area Meetup in China. And DevOps Times community and GreatOps community are co-organizer of the event. We hope we could introduce CI/CD to more Chinese IT companies to improve their IT performance.
Everyone likes the content and is curious to ask the lecturers: “As a programmer, why do you all have so luxuriant hairs?”
Details from the live broadcast content
Topic 1· CI/CD Practice of Large Mobile App
Shi Xuefeng, Engineering Efficiency Director of JD.COM,Jenkins Ambassador and Core author of DevOps Capability Maturity Model
First of all, Shi Xuefeng brought the wonderful topic of “Large Mobile App CI/CD.”
In the mobile era, mobile applications have become the main battlefield of business. In this activity, Xuefeng shared how is the CI/CD of a super large app is designed and implemented.
Topic 2· The implementation and practice of Agile && DevOps at CITIC Bank
Shi Lilong,Senior Expert, Software Development Center, CITIC Bank
Subsequently, Shi Lilong, a senior expert at the software development center of CITIC Bank, brought a wonderful sharing of “the implementation and practice of Agile and DevOps in CITIC Bank”.
Mr. Shi Lilong shared the overall promotion of CITIC Bank in Agile and DevOps, and the end-to-end tool chain of CITIC Bank.
Topic 3: How do large-scale financial and Internet companies conduct product library management?
Wang Qing,JFrog Chief Architect in China
Wang Qing, Chief Architect of JFrog China, brought a wonderful sharing of “How do large financial and Internet companies manage product libraries?”
Due to the large number of R & D personnel and large types of products delivered by large financial companies and Internet companies, the application dependent libraries and product libraries have become complicated and difficult to manage. After the implementation of many enterprise-level user product libraries, the advanced functions of the work-in-progress library solve the above problems and open up the second pulse of continuous delivery.
Topic 4: Watch out! 10 obstacles in DevOps Transformation
Shi Jingfeng, Senior DevOps expert in GreatOPS Community
Mr. Shi Jingfeng brought a wonderful sharing of “Watch out! 10 obstacles in DevOps Transformation.”
During the these days, many companies have started to work from home. Various obstacles appeared on the first day of WFH. The conference system was unstable, VPN connection was not available, remote desktops were queued, and the phone was busy. The implementation of DevOps seemed make all of these very easy . Jingfeng thinks that DevOps is like a journey, there are both beautiful attractions and obstacles. It is difficult to save yourself by not paying attention to the obstacles? How these pain points are addressed based on the DevOps Capability Maturity Model.
Experts Q&A
The last topic is a CI/CD expert question and answer part. All experts will answer the questions raised.
Finally, the last group photo of the experts, the CI/CD Meetup online salon was successfully held.
This event was co-sponsored by the CDF, DevOps Times community, and GreatOPS community. Thanks to the strong support of JFrog and Tencent Cloud Community.
The last story of the first CI/CD Meetup in China.
Shi Xuefeng (BC), Lei Tao and Forest Jing are the Jenkins Ambassador who are always organizing JAM in China. We all visited DevOps World Lisbon. At the event, we met Kohsuke Kawaguchi and Alyssa Tong. So we discussed to introduce CI/CD Meetup into China. It is a fantastic event.
From Dailymotion, a French video-sharing technology platform with over 300 million unique monthly users
At Dailymotion, we are hosting and delivering premium video content to users all around the world. We are building a large variety of software to power this service, from our player or website to our GraphQL API or ad-tech platform. Continuous Delivery is a central practice in our organization, allowing us to push new features quickly and in an iterative way.
We are early adopters of Kubernetes: we built our own hybrid platform, hosted both on-premise and on the cloud. And we heavily rely on Jenkins to power our “release platform”, which is responsible for building, testing, packaging and deploying all our software. Because we have hundreds of repositories, we are using Jenkins Shared Libraries to keep our pipeline files as small as possible. It is an important feature for us, ensuring both a low maintenance cost and a homogeneous experience for all developers – even though they are working on projects using different technology stacks. We even built Gazr, a convention for writing Makefiles with standard targets, which is the foundation for our Jenkins Pipelines.
In 2018, we migrated our ad-tech product to Kubernetes and took the opportunity to set up a Jenkins instance in our new cluster – or better yet move to a “cloud-native” alternative. Jenkins X was released just a few months before, and it seemed like a perfect match for us:
It is built on top and for Kubernetes.
At that time – in 2018 – it was using Jenkins to run the pipelines, which was good news given our experience with Jenkins.
It comes with features such as preview environments which are a real benefit for us.
And it uses the Gitops practice, which we found very interesting because we love version control, peer review, and automation.
While adopting Jenkins X we discovered that it is first a set of good practices derived from the best performing teams, and then a set of tools to implement these practices. If you try to adopt the tools without understanding the practices, you risk fighting against the tool because it won’t fit your practices. So you should start with the practices. Jenkins X is built on top of the practices described in the Accelerate book, such as micro-services and loosely-coupled architecture, trunk-based development, feature flags, backward compatibility, continuous integration, frequent and automated releases, continuous delivery, Gitops, … Understanding these practices and their benefits is the first step. After that, you will see the limitations of your current workflow and tools. This is when you can introduce Jenkins X, its workflow and set of tools.
We’ve been using Jenkins X since the beginning of 2019 to handle all the build and delivery of our ad-tech platform, with great benefits. The main one being the improved velocity: we used to release and deploy every two weeks, at the end of each sprint. Following the adoption of Jenkins X and its set of practices, we’re now releasing between 10 and 15 times per day and deploying to production between 5 and 10 times per day. According to the State of DevOps Report for 2019, our ad-tech team jumped from the medium performers’ group to somewhere between the high and elite performers’ groups.
But these benefits did not come for free. Adopting Jenkins X early meant that we had to customize it to bypass its initial limitations, such as the ability to deploy to multiple clusters. We’ve detailed our work in a recent blog post, and we received the “Most Innovative Jenkins X Implementation” Jenkins Community Award in 2019 for it. It’s important to note that most of the issues we found have been fixed or are being fixed. The Jenkins X team has been listening to the community feedback and is really focused on improving their product. The new Jenkins X Labs is a good example.
As our usage of Jenkins X grows, we’re hitting more and more the limits of the single Jenkins instance deployed as part of Jenkins X. In a platform where every component has been developed with a cloud-native mindset, Jenkins is the only one that has been forced into an environment for which it was not built. It is still a single point of failure, with a much higher maintenance cost than the other components – mainly due to the various plugins.
In 2019, the Jenkins X team started to replace Jenkins with a combination of Prow and Tekton. Prow (or Lighthouse) is the component which handles the incoming webhook events from GitHub, and what Jenkins X calls the “ChatOps”: all the interactions between GitHub and the CI/CD platform. Tekton is a pipeline execution engine. It is a cloud-native project built on top of Kubernetes, fully leveraging the API and capabilities of this platform. No single point of failure, no plugins compatibility nightmare – yet.
Since the beginning of 2020, we’ve started an internal project to upgrade our Jenkins X setup – by introducing Prow and Tekton. We saw immediate benefits:
Faster scheduling of pipelines “runners” pods – because all components are now Kubernetes-native components.
Simpler pipelines – thanks to both the Jenkins X Pipelines YAML syntax and the ability to easily decouple a complex pipeline in multiple small ones that are run concurrently.
Lower maintenance cost.
While replacing the pipeline engine of Jenkins X might seem like an implementation detail, in fact, it has a big impact on the developers. Everybody is used to see the Jenkins UI as the CI/CD UI – the main entry point, the way to manually restart pipelines executions, to access logs and test results. Sure, there is a new UI and a real API with an awesome CLI, but the new UI is not finished yet, and some people still prefer to use web browsers and terminals. Leaving the Jenkins Plugins ecosystem is also a difficult decision because some projects heavily rely on a few plugins. And finally, with the introduction of Prow (Lighthouse) the Github workflow is a bit different, with Pull Requests merges being done automatically, instead of people manually merging when all the reviews and automated checks are green.
So if 2019 was the year of Jenkins X at Dailymotion, 2020 will definitely be the year of Tekton: our main release platform – used by almost all our projects except the ad-tech ones – is still powered by Jenkins, and we feel more and more its limitations in a Kubernetes world. This is why we plan to replace all our Jenkins instances with Tekton, which was truly built for Kubernetes and will enable us to scale our Continuous Delivery practices.
By Tracy Ragan, CEO of DeployHub, CD Foundation Board Member
Microservice pipelines are different than traditional pipelines. As the saying goes…
“The more things change; the more things stay the same.”
As with every step in the software development evolutionary process, our basic software practices are changing with Kubernetes and microservices. But the basic requirements of moving software from design to release remain the same. Their look may change, but all the steps are still there. In order to adapt to a new microservices architecture, DevOps Teams simply need to understand how our underlying pipeline practices need to shift and change shape.
Understanding Why Microservice Pipelines are Different
The key to understanding microservices is to think ‘functions.’ With a microservice environment the concept of an ‘application’ goes away. It is replaced by a grouping of loosely coupled services connected via APIs at runtime, running inside of containers, nodes and pods. The microservices are reused across teams increasing the need for improved organization (Domain Driven Design), collaboration, communication and visibility.
The biggest change in microservice pipeline is having a single microservice used by multiple application teams independently moving through the life cycle. Again, one must stop thinking ‘application’ and think instead think ‘functions’ to fully appreciate the oncoming shift. And remember, multiple versions of a microservice could be running in your environments at the same time.
Microservices are immutable. You don’t ‘copy over’ the old one, you deploy a new version. When you deploy a microservice, you create a Kubernetes deployment YAML file that defines the Label and the version of the image.
In the above example, our Label is dh-ms-general. When a microservice Label is reused for a new container image, Kubernetes stops using the old image. But in some cases, a second Label may be used allowing both services to be running at the same time. This is controlled by the configuration of your ingresses. Our new pipeline process must incorporate these new features of our modern architecture.
Comparing Monolithic to Microservice Pipelines
What does your life cycle pipeline look like when we manage small functions vs. a monolithic applications running in a modern architecture? Below is a comparison for each category and their potential shift for supporting a microservice pipeline.
Change Request
Monolithic:
Logging a user problem ticket, enhancement request or anomaly based on an application.
Microservices:
This process will remain relatively un-changed in a microservice pipeline. Users will continue to open tickets for bugs and enhancements. The difference will be sorting out which microservice needs the update, and which version of the microservice the ticket was opened against. Because a microservice can be used by multiple applications, dependency management and impact analysis will become more critical for helping to determine where the issue lies.
Version Control
Monolithic:
Tracking changes in source code content. Branching and merging updates allowing multiple developers to work on a single file.
Microservices:
While versioning your microservice source code will still be done, your source code will be smaller, 100-300 lines of code versus 1,000 – 3,000 lines of code. This impacts the need for branching and merging. The concept of merging ‘back to the trunk’ is more of a monolithic concept, not a microservice concept. And how often will you branch code that is a few hundred lines long?
Artifact Repository
Monolithic:
Originally built around Maven, an artifact repository provides a central location for publishing jar files, node JS Packages, Java scripts packages, docker images, python modules. At the point in time where you run your build your package manager (maven, NPM, PIP) will perform the dependency management for tracking transitive dependencies.
Microservices:
Again, these tools supported monolithic builds and solved dependency management to resolve compile/link steps. We move away from monolithic builds, but we still need to build our container and resolve our dependencies. These tools will help us build containers by determining the transitive dependencies need for the container to run.
Builds
Monolithic:
Executes a serial process for calling compilers and linkers to translate source code into binaries (Jar, War, Ear, .Exe, .dlls, docker images). Common languages that support the build logic includes Make, Ant, Maven, Meister, NPM, PIP, and Docker Build. The build calls on artifact repositories to perform dependency management based on what versions of libraries have been specified by the build script.
Microservices:
For the most part, builds will look very different in a microservice pipeline. A build of a microservice will involve creating a container image and resolving the dependencies needed for the container to run. You can think of a container image to be our new binary. This will be a relatively simple step and not involve a monolithic compile/link of an entire application. It will only involve a single microservice. Linking is done at runtime with the restful API call coded into the microservice itself.
Software Configuration Management (SCM)
Monolithic:
The build process is the central tool for performing configuration management. Developers setup their build scripts (POM files) to define what versions of external libraries they want to include in the compile/link process. The build performs configuration management by pulling code from version control based on a ‘trunk’ or ‘branch. A Software Bill of Material can be created to show all artifacts that were used to create the application.
Microservices:
Much of what we use to do for configuring our application occurred at the software ‘build.’ But ‘builds’ as we know them go away in a microservice pipeline. This is where we made very careful decisions about what versions of source code and libraries we would use to build a version of our monolithic application. For the most part, the version and build configuration shifts to runtime with microservices. While the container image has a configuration, the broader picture of the configuration happens at run-time in the cluster via the APIs.
In addition, our SCM will begin to bring in the concept of Domain Driven Design where you are managing an architecture based on the microservice ‘problem space.’ New tooling will enter the market to help with managing your Domains, your logical view of your application and to track versions of applications to versions of services. In general, SCM will become more challenging as we move away from resolving all dependencies at the compile/link step and must track more of it across the pipeline.
Continuous Integration (CI)
Monolithic:
CI is the triggered process of pulling code and libraries from version control and executing a Build based on a defined ‘quiet time.’ This process improved development by ensuring that code changes were integrated as frequently as possible to prevent broken builds, thus the term continuous integration.
Microservices:
Continuous Integration was originally adopted to keep us re-compiling and linking our code as frequently as possible in order to prevent the build from breaking. The goal was to get to a clean ’10-minute build’ or less. With microservices, you are only building a single ‘function.’ This means that an integration build is no longer needed. CI will eventually go away, but the process of managing a continuous delivery pipeline will remain important with a step that creates the container.
Code Scanning
Monolithic:
Code scanners have evolved from looking at coding techniques for memory issues and bugs to scanning for open source library usage, licenses and security problems.
Microservices:
Code scanners will continue to be important in a microservice pipeline but will shift to scanning the container image more than the source. Some will be used during the container build focusing on scanning for open source libraries and licensing while others will focus more on security issues with scanning done at runtime.
Continuous Testing
Monolithic:
Continuous testing was born out of test automation tooling. These tools allow you to perform automated test on your entire application including timings for database transactions. The goal of these tools is to improve both the quality and speed of the testing efforts driven by your CD workflow.
Microservices:
Testing will always be an important part of the life cycle process. The difference with microservices will be understanding impact and risk levels. Testers will need to know what applications depend on a version of a microservice and what level of testing should be done across applications. Test automation tools will need to understand microservice relationships and impact. Testing will grow beyond testing a single application and instead will shift to testing service configurations in a cluster.
Security
Monolithic:
Security solutions allow you to define or follow a specific set of standards. They include code scanning, container scanning and monitoring. This field has grown into the DevSecOps movement where more of the security activities are being driven by Continuous Delivery.
Microservices:
Security solutions will shift further ‘left’ adding more scanning around the creation of containers. As containers are deployed, security tools will begin to focus on vulnerabilities in the Kubernetes infrastructure as they relate to the content of the containers.
Continuous Delivery Orchestration (CD)
Monolithic:
Continuous Delivery is the evolution of continuous integration triggering ‘build jobs’ or ‘workflows’ based on a software application. It auto executes workflow processes between development, testing and production orchestrating external tools to get the job done. Continuous Delivery calls on all players in the lifecycle process to execute in the correct order and centralizes their logs.
Microservices:
Let’s start with the first and most obvious difference between a microservice pipeline and a monolithic pipeline. Because microservices are independently deployed, most organizations moving to a microservice architecture tell us they use a single pipeline workflow for each microservice. Also, most companies tell us that they start with 6-10 microservices and grow to 20-30 microservices per traditional application. This means you are going to have hundreds if not thousands of workflows. CD tools will need to include the ability to template workflows allowing a fix in a shared template to be applied to all child workflows. Managing hundreds of individual workflows is not practical. In addition, plug-ins need to be containerized and decoupled from a version of the CD tool. And finally, look for actions to be event driven, with the ability for the CD engine to listen to multiple events, run events in parallel and process thousands of microservices through the pipeline.
Continuous Deployments
Monolithic:
This is the process of moving artifacts (binaries, containers, scripts, etc.) to the physical runtime environments on a high frequency basis. In addition, deployment tools track where an artifact was deployed along with audit information (who, where, what) providing core data for value stream management. Continuous deployment is also referred to as Application Release Automation.
Microservices:
The concept of deploying an entire application will simply go away. Instead, deployments will be a mix of tracking the Kubernetes deployment YAML file with the ability to manage the application’s configuration each time a new microservice is introduced to the cluster. What will become important is the ability to track the ‘logical’ view of an application by associating which versions of the microservices make up an application. This is a big shift. Deployment tools will begin generating the Kubernetes YAML file removing it from the developer’s to-do list. Deployment tools will automate the tracking of versions of the microservice source to the container image to the cluster and associated applications to provide the required value stream reporting and management.
Conclusion
As we shift from managing monolithic applications to microservices, we will need to create a new microservice pipeline. From the need to manage hundreds of workflows in our CD pipeline, to the need for versioning microservices and their consuming application versions, much will be different. While there are changes, the core competencies we have defined in traditional CD will remain important even if it is just a simple function that we are now pushing independently across the pipeline.
About the Author
Tracy Ragan is CEO of DeployHub and serves on the Continuous Delivery Foundation Board. She is a microservice evangelist with expertise in software configuration management, builds and release. Tracy was a consultant to Wall Street firms on build and release management for 7 years prior to co-founding OpenMake Software in 1995. She was a founding member of the Eclipse organization and served on the board for 5 years. She is a recognized leader and has been published in multiple industry publications as well as presenting to audiences at industry conferences. Tracy co-founded DeployHub in 2018 to serve the microservice development community.
Forgotten AWS EC2 instances have made everyone’s pockets hurt (including Puppet!). Take it from us (relay.sh team) — if you don’t proactively clean up unused EC2 instances, cloud spending can quickly get out of control. However, it can be tedious to routinely check which EC2 instances are still in use, track down the old ones, and remove them. Luckily — we know how to automate these tasks!
Our mission is to free you to do what robots can’t.
This post walks you through de-provisioning unused EC2 instances by using AWS Lambda and CloudFormation to deploy an EC2 reaper that uses simple Tags to cut down on spending.
The AWS Reaper works by checking and enforcing tags that are set on the EC2 instances. All EC2 instances must be tagged with a lifetime or a termination_date. The termination_date defines a future date after which the EC2 instance will be terminated. Alternatively, the Reaper looks for a lifetime tag– if found, it calculates a new future date and adds that date as the termination_date tag for the EC2 instance.
First, let’s look at the reaper.py. The main reaper logic for handling instances is in the terminate_expired_instances function which lists instances and looks up the termination date tag for each instance:
Improperly Tagged Instances
If we find an instance that doesn’t have a termination_dateor we find the tag can’t be parsed, we stop it:
This enables us to stop the b̶l̶e̶e̶d̶i̶n̶g̶ billing while we contact the instance owner to see if it should still be kept around.
Expired Instances
For all instances we find that are expired, we destroy:
Deploying the EC2 reaper
Now, we could just run this python script against different AWS regions and we’d already be better off than doing this manually. However, we would rather not spend time babysitting scripts at all. We’re going to deploy this into AWS using CloudFormation Stacks.
Deploying the AWS reaper has two parts:
deploy_to_s3.yaml AWS CloudFormation template that places the lambda zip resources in S3 buckets in every region so that the deploy_reaper template can read them for Reaper deployment.
deploy_reaper.yaml AWS CloudFormation template that installs the reaper creates the IAM role and deploys the lambda function to perform the instance reaping.
deploy_to_s3 template
In order to use this template, you must first manually create an S3 bucket that contains the resources to copy across all regions. You will need to do this once per region; S3 resources can be read between accounts but not between regions for AWS Lambda. This only needs to be done one time for the administrative account.
Manually create an S3 bucket accessible from the administrative account. Zip up the two python reaper files, reaper.py and slack_notifier.py and place them in the bucket, naming them reaper.zip and slack_notifier.zip.
From the administrative account, create a new stack set and use the deploy_to_s3 template. An example CLI invocation would look like:
Deploy stack-set-instances for this stack set, one per region in the administrative account. Check the Amazon documentation for the most up-to-date region list. For example:
Once deployed, the EC2 Reaper will not reap anything unless the environment variable LIVEMODE is set to TRUE. It will only report what it would have done to Slack.
When the time comes to activate the Reaper, update the parameter value LIVEMODE to “TRUE”(the regex is case-insensitive).
Now you have learned how to control costs on AWS by reaping old EC2 instances. To learn more about our mission and product, sign up for our updates on relay.sh. Our mission is to free you of tedious cloud-native workflows with event-driven automation! For more content like this, please follow our medium page at https://medium.com/relay-sh.
I am so excited to engage with all of you members associated with the Continuous Delivery Foundation (CDF) as a newly appointed CDF Ambassador.
Let me introduce myself to you !
I am currently working full-time as the Global community Ambassador and Region Head of APJ & MEA region at DevOps Institute, which is the world’s fastest and largest growing DevOps professional’s association consisting of vibrant Humans of DevOps community located worldwide.
I hold expertise as an emerging best practices evangelist and in building massive global, social and online communities with the commitment to connect the Humans of DevOps and Modern IT to advance the Skills, Knowledge, Ideas & Learning (SKIL) with ease, sharing and extensive collaboration. I am a frequent Speaker at local and international conferences and also the Core organizer of Global SKILup Day and Chief Evangelist of Ambassador program by DevOps Institute. Some of my public presentations are also available on YouTube for reference. I am passionate about engaging with various community members & leaders spanned across Industries and domains worldwide.
In 2019, I traversed across the world for various speaking and organizing engagements for the community & Partners with key regions like US, Europe and executed a Asia-Pacific roadshow spanning across India, Singapore, Indonesia, Australia & New Zealand. Some of the glimpses from last year engagements while you can find more on my LinkedIn updates.
I am so excited to associate with CDF as an Ambassador for a variety of reasons. The core values of CDF with an Open-governance and vendor neutral model and providing guidance and resources to foster collaboration and eventually empowering developers, teams to produce and release high quality software is an unprecedented and fantastic initiative.
Since, I have been part of the global communities across multiple domains and regions as a member, facilitator, committee members, organizer – I am enthusiastic to contribute and support the global community of CD Foundation. As a CDF Ambassador, I would like to amplify and resonate the core values of CD Foundation to reach wider audience and networks while building an inclusive community of developers, vendors, Industry Partners, members and end users facilitating sustainable projects that are part of the broad and growing continuous delivery ecosystem.
I would love to engage with each one of you actively across platforms either in your nearby locations or at an upcoming Conferences or meetups or Online summit where I am either participating, speaking or organizing. There is so much to look forward to and so much more to share, learn and advance together as the Humans of CD Foundations. You can find me in some of the upcoming conferences or meetups below which are confirmed and also can connect with me on LinkedIn or Twitter with below details.
By Jacqueline Salinas, CDF Director of Ecosystem & Community Development
Dear CDF Members –
Please welcome the first cohort of CDF Community Ambassadors (CDF CA)! You might be wondering what exactly is a CDF Community Ambassador (CDF CA)? Well, a CA is a passionate volunteer, representative of CDF, and Meetup super host & organizer. The vision of the CA program is to help grow the network of passionate CI/CD communities and connect them through various efforts that the CD Foundation is launching in 2020. The CD Foundation sees these CA’s as the troops on the ground rallying the community together. They are stewards of CI/CD education & best practices for their local community, active open source project contributors, and leaders helping drive awareness of open source projects.
These 13 folks have stepped up and committed to helping grow awareness about the CD Foundation, as well as, help us deploy new Meetup user groups in new locations as an effort to drive more events globally. These CA’s will continue to deliver CI/CD education to their local community and most importantly help recruit new Meetup members. These volunteers are vital to the CI/CD community and to the CD Foundation! Help me give them a warm welcome to the CD Foundation community. To help you get to know our CDF Community Ambassadors better, look for their individual blogs coming in the next few weeks.
Prefer a more active role? Learn more about what it takes to become a CDF Community Ambassador. Here’s more about what the role of the CDF Community Ambassador entails:
o The Community Go-To Resource for People Interested in CDF
As a Community Ambassador, you will be an important resource to people interested in the CDF and its corresponding projects. We will provide you with training on how to best represent the CDF and provide discount codes for you to attend CDF-sponsored events.
o Help Local Users Learn More About CDF
As a Community Ambassador, you will organize and host a local CDF Users Group meetup. The CDF will provide resources to help you set up your meetup and ongoing support such as swag credits and reimbursements for costs associated with running a community event.
o Represent the Community Publicly
As a Community Ambassador, you will be a public-facing community representative. You can choose the way you are most comfortable representing CDF whether that’s through public speaking or written content such as blogs. We will work with you to find the best fit and provide you with resources to help you be successful as either a speaker, a writer, or both!
Originally posted on the Armory blog, by Rosalind Benoit
Guess what?! Our Hackathon is going fully online! “Spinnaker Gardening Days #CommunityHack” happens in one month, and we’re gearing up for an international open-source work-from-home extravaganza! Via Zoom, Slack, and Github, we’ll empower you to move the needle on continuous delivery projects. Teams will hack, newcomers will train, and champions will share Spinnaker secrets. Click here to register and get your free tickets for the hackathon, training track, lunchtime learnings, or all three.
Join other Spinnaker users and companies to learn and let your skills shine at this collaborative event. We’ll address open-source feature requests, extend the ecosystem, and have lots of fun. Thanks to our generous sponsor Salesforce, all logged-in participants will score prizes, premium swag, and lunch on us! Hack through the workday, or check out our noontime lightning talks. Visit the Spinnaker Gardening repository for the schedule and details.
The Armory Tribe celebrates the support of Salesforce and, in particular, Edgar Magana, a Spinnaker champion and Cloud Operations Architect. We recently sat down to discuss the Ops SIG, modeling and standardizing Spinnaker, and his ideas for hackathon projects. Read the full article here.
A relative newcomer to the Spinnaker community, but a veteran in matters of cloud computing, networking, and OSS projects like OpenStack, Edgar recently founded the Operations SIG (Special Interest Group). Just as he recognized that “the community needed a place to discuss how to operate Spinnaker better,” he also urges us to jump-start the Spinnaker community. He’s recommended improvements to the contributor experience, and persuaded Salesforce to sponsor this first-ever Spinnaker hackathon.
Of course, we touched on his most pressing open-source Spinnaker initiatives in our chat. Next up? Gather a team!
“We really want to come to the hackathon with goals, and to put extra motivation for folks to address them as a community,” Edgar explains their sponsorship.
From Salesforce and the Ops SIG perspective, Edgar has two features stories to focus on at the hackathon:
“Run any OSS source code scanning software against Spinnaker microservices, and you’ll find a number of vulnerabilities in the libraries that Spinnaker leverages. We’d like to minimize and solve those as much as possible.”
I’m pumped about this one because a) in many instances, this is a low-barrier-to-entry task that newer contributors can make a huge dent in, and b) every ops freak knows that fixing OSS dependencies is probably the most important security measure we touch.
“Cloud driver scalability is another key initiative in progress. The dynamic account system works, but performance can be improved drastically for those using a large system with 800-1000 Kubernetes accounts. There was a bugfix in 1.17, but it still takes lots of time for clouddriver to cache new accounts, and this means a long startup time.”
Edgar would like to see new accounts dynamically appended to the cache instead of triggering another cache of all accounts, and has been collaborating with Armory engineers on a solution. Another excellent project goal for Community Gardening!
Here on Armory’s Community team, we second Edgar’s suggestion to make Spinnaker more “beginner-friendly” and welcoming to new contributors. Our top goals for the first half of 2020 revolve around improving the contributor experience, from promoting issue triage in SIGs, to creating and organizing documentation around Spinnaker development environment, release cycle, and contribution guidelines so that newcomers know where to find answers and how to get started. Expect to see a contributor experience project from us at the hackathon!
In the meantime, the Plugin Framework for Spinnaker that Armory and Netflix are building is maturing fast. This work will make Spinnaker more welcoming to contributors in another way: it provides clear extension points in the codebase, along with an easy way to load extensions to a running Spinnaker instance. With the Spinnaker Gardening Days, we want encourage you to build extensions. Moreover, we know that many teams using Spinnaker in production have already built custom tooling around it; we’re encouraging those teams to leverage the plugin framework to quickly share their work with the OSS community (sounds like a stellar hackathon project!). We’re better together, and with a widely adopted project like Spinnaker, you can feel sure that paying it forward will reap big dividends for you and your organization. Check out the Plugin Creators Guide and Plugin Users Guide to learn more!
Calling Edgar and all other incredible Spinnaker developers: it’s time to add your fantastic Spinnaker Gardening ideas to the Project Ideas Wiki, create a slack channel for your project, and start prepping for the most exciting online event of 2020! Don’t forget to register here and reserve your ticket : )
Last year on the 12th of March 2019, the Continuous Delivery Foundation was launched at the Open Source Leadership Summit. Community leaders from Spinnaker, Jenkins, Tekton and Jenkins X came together to kick off the CDF as the new home for open source collaboration in CI/CD.
Since then we have made a lot of progress – earlier this year we produced our first annual report that showcases our efforts from our first few months. We also produced the first CD Foundation Interactive Landscape to help clarify the tools needed to adopt a fully automated CD process.
We didn’t stop there! Our CI/CD meetups are now at 25,000+ members in 67 groups spread across 30 countries! There’s probably a CI/CD meetup nearby you. Come participate!
We also have Special Interest Groups (SIGs) in Interoperability, Security, and Machine Learning (MLOps) as ways for people to participate in specific areas of expertise or interest.
And we’ve had a wide array of new members and new projects join. Membership spans a broad range of industries, international markets, and sizes of organizations. New members in the past year include Japanese Global 500 IT services provider Fujitsu, Integration Platform-as-a-Service provider Boomi, DevOps platform Cycloid, the Association of DevOps Professionals, the DevOps Institute, Global commerce leader eBay, leading global financial services firm JPMorgan Chase, and Open Source components management company Whitesource.
These new General Members bring the membership total to 33 and join Premier Members CapitalOne, CircleCI, Cloudbees, Fujitsu, Google, Huawei, IBM, jFrog, Netflix and Salesforce in working together to make continuous delivery tools and processes as accessible and reliable as possible and grow the overall ecosystem.
And just last month Screwdriver joined as our first incubation project. Screwdriver is a self-contained, pluggable service to help developers build, test, and continuously deliver software using the latest containerization technologies. Screwdriver was originally developed by Yahoo, now Verizon Media, as simplified interfacing for Jenkins. It was open sourced in 2016 and completely rebuilt to handle deployments at scale along with CI/CD goals.
Where are we headed? In our first year we have mapped out our 9 strategic objectives and our one year anniversary is a great way to round up how we are doing working towards them.
Drive Continuous Delivery Adoption – The CDF Interactive Landscape was one big initiative kicked off this year to help clarify the tools needed to adopt a fully automated CD process.
Champion Diversity & Inclusion – Initiatives in this space include diversity scholarships for our events and participation in Outreachy which have allowed us to start welcoming more voices into our communities.
Foster Community Relations – We have started soliciting priorities and working with many different communities. The Jenkins Area Meetups were contributed to CDF and expanded to CI/CD meetups and we also offer online training courses.
Grow the membership base – We are proud to have a membership of over 30 organizations which includes end user companies, vendors, start-ups, universities and institutes.
Create value for all members – We continue to listen to feedback from our individual and organization members. We held many events in 2019 including our popular mindshare cocktail hour as a way to stay close to the needs of our members.
Expand into emerging tech areas – One of the key area has been around MLOps – marrying DevOps with Machine learning through the efforts of our MLOps Special Interest Group.
We have had a lot of work done by our community. Thank you! And we have lots more fun on the way.
To keep up-to-date, sign up for our newsletter and join us in 2020 as we continue to grow and advance CI/CD in the industry!
Originally posted on the Spinnaker Community blog, by Rob Zienert, Sr Software Engineer @ Netflix
Long, long ago, in an internet that I barely remember, I wrote about monitoring Orca. I haven’t managed to take the time to write another post about a specific service — it’s a lot of work! Instead of going deep this time around, I want to paint with broader strokes: What are the key metrics we can track that help quickly answer the question, “Is Spinnaker healthy?”
Spinnaker is comprised of about a dozen open source services that may vary widely based on configuration, and as such, there’s no singular metric to rule them all. This makes the question, “Is Spinnaker healthy?” a particularly bothersome question since not all services are equally important. If Igor — the service that is responsible for monitoring CI/SCM systems — is unable to communicate with Jenkins, Spinnaker will be in a degraded state, but its core behavior is still healthy. Should Orca’s queue processing drop to zero, however, it’s time to have an elevated heart rate and quick remedy.
Service Metrics
The Service Level Indicators for our individual services can vary depending on configuration. For example, Clouddriver has cloud provider-specific metrics that should be tracked in addition to its core metrics. For the sake of this post’s length, I won’t be going into any cloud-specific metrics.
Universal Metrics
All Spinnaker services are RPC-based, and as such, the reliability of requests inbound and outbound are supremely important: If the services can’t talk to each other reliably, someone will be having a poor experience.
For each service, a controller.invocations metric is emitted, which is a PercentileTimer including the following tags:
status: The HTTP status code family, 2xx, 3xx, 4xx...
statusCode: The actual HTTP status code value, 204, 302, 429...
success: If the request is considered successful. There’s nuance here in the 4xx range, but 2xx and3xx are definitely all successful, whereas 5xx definitely are not
controller: The Spring Controller class that served this request
method: The Spring Controller method name, NOT the HTTP method
Similarly, each service also emits metrics for each RPC client that is configured via okhttp.requests. That is, Orca will have a variety of metrics for its Echo client, as well as its Clouddriver client. This metric has the following tags:
status: The HTTP status code family, 2xx, 3xx, 4xx...
statusCode: The actual HTTP status code value, 204, 302, 429...
success: If the request is considered successful
authenticated: Whether or not the request was authenticated or anonymous (if Fiat is disabled, this is always false)
requestHost: The DNS name of the client. Depending on your topology, some services may have more than one client to a particular service (like Igor to Jenkins, or Orca to Clouddriver shards).
Having SLOs — and consequentially, alerts — around failure rate (determined via the succcess tag) and latency for both inbound and outbound RPC requests is, in my mind, mandatory across all Spinnaker services.
As a real world example, the alert Netflix uses for Orca to all of its client services is:
So, for people who can’t read Atlas expressions, if we have more than 0.2 failing/unknown RPS to a specific service over 3 minutes, we’ll get an alert.
Service-specific Metrics
Most of our services have an additional metric to judge operational health, but in/out RPC monitoring will go far if you’re just starting out.
Echo echo.triggers.count tracks the number of CRON-triggered pipeline executions fired. This value should be pretty steady, so any significant deviation is an indicator of something going awry (or the addition/retirement of a customer integration). echo.pubsub.messagesProcessed is important if you have any PubSub triggers. Your mileage may vary, but Netflix can alert if any subscriptions drop to zero for more than a few minutes.
Orca task.invocations.duration tracks how long individual queue tasks take to execute. While it is a Timer, for an SLA Metric, its count is what’s important. This metric’s value can vary widely, but if it drops to zero, it means Orca isn’t processing any new work, so Spinnaker is dead in the water from a core behavior perspective.
Clouddriver: Each cloud provider is going to emit its own metrics that can help determine health, but two universal ones I recommend tracking are related to its cache. cache.drift tracks cache freshness. You should group this by agent and region to be granular on exactly what cache collection is falling behind. How much lag is acceptable for your org is up to you, but don’t make it zero. executionCount tracks the number of caching agent executions and combined with status , we can track how many specific caching agents are failing at any given time.
Igor pollingMonitor.failed tracks the failure rate of CI/SCM monitor poll cycles. Any value above 0 is a bad place to be, but is often a result of downstream service availability issues such as Jenkins going offline for maintenance. pollingMonitor.itemsOverThreshold tracks a polling monitor circuit breaker. Any value over 0 is a bad time, because it means the breaker is open for a particular monitor and it requires manual intervention.
Product SLAs at Netflix
We also track specific metrics as they pertain to some of our close internal customers. Some customers care most about latency reading our cloud cache, others have strict requirements in latency and reliability of ad-hoc pipeline executions.
In addition to tracking our own internal metrics for each customer, we also subscribe to our customers’ alerts against Spinnaker. If internal metrics don’t alert us of a problem before our customers are aware something is wrong, we at least don’t want to wait for our customers to tell us.
Continued Observability Improvements
Since Spinnaker is such a large, varied system, blog posts such as these are fine, but really are meant to get the wheels turning on what could be possible. It also highlights a problem with Spinnaker today: A lack of easily discoverable operational insights and knobs. No one should have to rely on a core contributor to distill information like this into a blog post!
There’s already been a start to improving automated service configuration property documentation, but something similar needs to be started for metrics and matching admin APIs as well. A contribution that documents metrics, their tags, purpose and related alerts would be of huge impact to the project and something I’d be happy to mentor on and/or jumpstart.
Of course, if you want to get involved in improving Spinnaker’s operational characteristics, there’s a Special Interest Group for that. We’d love to see you there!