Skip to main content
All Posts By

CD Foundation

Intro to Jenkins Training Course Enrolls Over 5,000 in First Month

By Blog, Staff

Linux Foundation Training and Continuous Delivery Foundation launched a free training course on the edX platform, LFS167x – Introduction to Jenkins, on June 4. In that time, the course has already enrolled 5,000 students, making it one of the fastest-growing courses we have ever released. This is great news for helping to grow the Jenkins and overall DevOps communities.

The course covers the fundamentals of continuous integration/continuous delivery (CI/CD), and how they help transform the overall software delivery process. It is most useful for roles such as DevOps engineers, software developers and architects, and professionals focused on site reliability and quality assurance, though anyone involved in the software delivery process will benefit. It includes a detailed introduction to the Jenkins automation server, and also provides instructions on how to set up/use Jenkins for CI/CD workflows. 

Upon completion, enrollees will have a solid understanding of the role that Jenkins plays in the software development lifecycle, how to install a Jenkins server, how to build software with it, how to manage third party integrations/plugins and how to scale and secure Jenkins. They will also get a glimpse of what they can do to further enhance their CI/CD skills. 

Join the more than 5,000 individuals who have started improving their software delivery processes – enroll for free today!

Introducing Our Newest CDF Ambassador – Romnick Acabado

By Blog, Staff

Hi CI/CD Fans,

I’m Romnick Acabado, a DevOps Leader and IT Manager from Lingaro Philippines. I use my strengths, passion, and purpose in exploring, learning, and sharing DevOps practices to improve the lives of the people who are involved within the flow of business ideas to the best quality of user experience.

In short, I would like to help to improve the lives of people all over the world through modern IT.

This year marks my tenth year in my IT professional career. Learning DevOps practices becomes my priority as we call our team a “DevOps Team.”

Everything starts with awareness, and you will only know your team’s position, validate your overall experience and challenges on something when you are exposed to the external community.

When I was studying, I was very active in joining student societies to give additional value to our community. I have never imagined that I could continue it in the corporate world because of my doubts.

Last 2019, I worked a lot on how I can improve my confidence, and I believe that the actions that I took helped me to believe in myself, leverage my strengths (analytical, responsibility, relater, communication & learning) and competencies while outworking my potentials. So in the latter part of 2019, I found the Ambassador Program of DevOps Institute, and I tried to sign-up. Forest Jing and Dheeraj Nayal were too accommodating to assist me in the process. I have been a fan of the learning model of 70-20-10, where 20% of learning a skill usually comes from your relationship and connections to experts, so I thought that it is about time not to limit my connections within my company. Luckily, I was selected, and I have successfully joined the program.

I have focused on improving myself and continuously be an asset wherever I am engaged, and I am happy that the DOI’s Chief Ambassador has recognized it:

As I continue to build my branding and represent our organization in DevOps, I have embraced my role as one of the leaders in the Philippines to support and promote the DevOps movement.

I created a website, where I share my exploration and DevOps journey. I also maintain a Facebook and Twitter account.

You can also join my Meetup group DevOps SKIL Up PH, where I want to build a community of DevOps practitioners and leaders in our country to help each other through upskilling.

I’m joining the CDF as an ambassador because I believe that DOI and what they advocate are intersecting. I even see my co-ambassadors in both communities. I am confident that through CDF, I will be able to learn more and share my knowledge about CI/CD tools, which are essential in the DevOps design.

I also see the alignment of my vision, mission, and my values about creativity, being solution-oriented and collaborative in these communities.

Being a life-long learner, I know that I will continue to learn with the experts in this community, and I will be able to give value as well through my time and expertise. It’s always fun to be surrounded by like-minded DevOps professionals.

For the past months, I was able to join the DevOps summit and panel discussion in our DevOps communities at the DevOps Summit in April 2020 and DevOps India Summit. I’ll also be speaking at the DevOps upcoming Virtual DevOps Summit on November 2-3.

You can connect to me through my LinkedIn account for future collaboration about DevOps, CI/CD, or analytics solutions.

Good luck to your DevOps journey and see you at future CDF events. It’s my pleasure to represent and be part of this great community! 😉

Introducing Our Newest CDF Ambassador – Steven Terrana

By Blog, Staff

Heyyo!

My name is Steven Terrana. It’s great to be here! I’m currently a DevSecOps & Platforms Engineer at Booz Allen Hamilton.

My day to day largely consists of working with teams to implement large-scale CI/CD pipelines using Jenkins, implementing DevSecOps principles, and adopting all the buzzwords :).

Through experiencing all of the pains associated with the “large-scale” pipeline development, I developed the Jenkins Templating Engine: a Jenkins plugin that allows users to stop copying and pasting Jenkinsfiles by creating tool-agnostic, pipeline templates that can be shared across teams enabling organizational governance while optimizing for developer autonomy. If that sounds cool, you can check out the Jenkins Online Meetup.

You can probably find me somewhere in the Jenkins community. I help drive the Pipeline Authoring SIG and contribute to community plugins and pipeline documentation where I can.

I’m excited to be a part of an organization in CDF that’s helping to establish best practices, propel the adoption of continuous delivery tooling, and facilitate interoperability across emerging technologies to streamline software delivery.

Oh, yeah, and I have two cats and a turtle. Meet James Bond, GG, and Sheldon:

Follow me on Twitter @steven_terrana

A case for declarative configurations for ML training

By Blog, Community

Contributed by Benedikt Koller

Original article posted on May 17, 2020

No way around it: I am what you call an “Ops guy”. In my career I admin’ed more servers than I’ve written code. Over twelve years in the industry have left their permanent mark on me. For the last two of those I’m exposed to a new beast – Machine Learning. My hustle is bringing Ops-Knowledge to ML. These are my thoughts on that.

Deploying software into production

Hundreds of thousands of companies deploy software into production every day. Every deployment mechanism has someone who built it. Whoever it was (The Ops Guy™, SRE-Teams, “Devops Engineers”), all follow tried-and-true paradigms. After all, the goal is to ship code often, in repeatable and reliable ways. Let me give you a quick primer on two of those.

Infrastructure-as-code (IaC)

Infrastructure as code, or IaC, applies software engineering rules to infrastructure management. The goal is to avoid environment drift, and to ensure idempotent operations. In plain words, read the infrastructure configuration and you’ll know exactly how the resulting environment looks like. You can rerun the provisioning without side effects, and your infrastructure has a predictable state. IaC allows for version-controlled evolution of infrastructures and quick provisioning of extra resources. It does so through declarative configurations.

Famous tools for this paradigm are Terraform, and to a large degree Kubernetes itself.

Immutable infrastructure

In conjunction with IaC, immutable infrastructure ensures the provisioned state is maintained. Someone ssh’ed onto your server? Its tainted – you have no guarantee that it still is in the identical shape to the rest of your stack. Interaction between a provisioned infrastructure and new code happens only through automation. Infrastructure, e.g. a Kubernetes cluster, is never modified after it’s provisioned. Updates, fixes and modifications are only possible through new deployments of your infrastructure.

Operational efficiency requires thorough automation and handling of ephemeral data. Immutable infrastructure mitigates config drift and snowflake server woes entirely.

ML development

Developing machine learning models works in different ways. In a worst case scenario, new models begin their “life” in a Jupyter Notebook on someones laptop. Code is not checked into git, there is no requirements file, and cells can be executed in any arbitrary order. Data exploration and preprocessing are intermingled. Training happens on that one shared VM with the NVIDIA K80, but someone messed with the CUDA drivers. Ah, and does anyone remember where I put those matplotlib-screenshots that showed the AUROC and MSE?

Getting ML models into production reliably, repeatedly and fast remains a challenge, and large data sets become a multiplying factor. The solution? Learn from our Ops-brethren.

We can extract key learnings from the evolution of infrastructure management and software deployments:

  1. Automate processing and provisioning
  2. Version-control states and instructions
  3. Write declarative configs

How can we apply them to a ML training flow?

Fetching data

Automate fetching of data. Declaratively define the datasource, the subset of data to use and then persist the results. Repeated experiments on the same source and subset can use the cached results.

Thanks to automation, fetching data can be rerun at any time. The results are persisted, so data can be versioned. And by reading the input configuration everyone can clearly tell what went into the experiment.

Splitting (and preprocessing data)

Splitting data can be standardized into functions.

  • Splitting happens on a quota, e.g. 70% into train, 30% into eval. Data might be sorted on an index, data might be categorized.
  • Splitting happens based on features/colums. Data might be categorized, Data might be sorted on an index.
  • Data might require preprocessing / feature engineering (e.g. filling, standardization).
  • A wild mix of the above.

Given those, we can define an interface and invoke processing through parameters – and use a declarative config. Persist the results so future experiments can warm-start.

Implementation of interfaces makes automated processing possible. The resulting train/eval datasets are versionable, and my input config is the declarative authority on the resulting state of the input dataset.

Training

 

Standardizing models is hard. Higher-level abstractions like Tensorflow and Keras already provide comprehensive APIs, but complex architectures need custom code injection.

A declarative config will, at least, state which version-controlled code was used. Re-runs on the same input will deliver the same results, re-runs on different inputs can be compared. Automation of training will yield a version-controllable artefact – the model – of a declared and therefore anticipatable shape.

Evaluation

Surprisingly, this is the hardest to fully automate. The dataset and individual usecase define the required evaluation metrics. However, we can stand on the shoulders of giants. Great tools like Tensorboard and the What-If-Tool go a long way. Our automation just needs to account for enough flexibility that a.) custom metrics for evaluation can be injected, and b.) raw training results are exposed for custom evaluation means.

Serving

Serving is caught between the worlds. It would be easy to claim that a trained model is a permanent artifact, like you might claim that a Docker container acts as an artifact of software development. We can borrow another learning from software developers – if you don’t understand where your code is run, you don’t understand your code.

Only by understanding how a model is served will a ML training flow ever be complete. For one, data is prone to change. A myriad of reasons might be the cause, but the result remains the same: Models need to be retrained to account for data drift. In short, continuous training is required. Through the declarative configuration of our ML flow so far we can reuse this configuration and inject new data – and iterate on those new results.

For another, preprocessing might need embedding with your model. Automation lets us apply the same preprocessing steps used in training to live data, guaranteeing identical shape of input data.

Why?

Outside academia, performance of machine learning models is measured through impact – economically, or by increased efficiency. Only reliable and consistent results are true measures for the success of applied ML. We as a new and still growing part of software engineering have to make sure of this success. And the reproducibility of success hinges on the repeatability of the full ML development lifecycle.