MLOps Roadmap - CD Foundation

CDF Newsletter – June 2020 Article
Subscribe to the Newsletter

Contributed by Terry Cox

We are pleased to announce the MLOps Roadmap, which is intended to set out a picture of the current state of MLOps and provide a five-year roadmap for future customer needs in order to support pre-competitive collaboration across the industry with a view to improving the overall state of MLOps as a capability for all.

What is MLOps?

We consider MLOps as the extension of the DevOps methodology to include Machine Learning and Data Science assets as first class citizens within the software development life cycle.

Practical, real-world products include the need for both conventional and Machine Learning (ML) components, so it is critical to be able to manage all of these components consistently as common assets within the scope of a given technology solution.

MLOps should be viewed as a practice for consistently managing the ML aspects of products in a way that is unified with all of the other technical and non-technical elements necessary to successfully commercialise those products with maximum potential for viability in the marketplace. This includes DataOps, too, as Machine Learning without complete, consistent, semantically valid, correct, timely, and unbiased data is problematic or leads to flawed solutions that can exacerbate built-in biases.

At the present time, managing ML assets in production remains at a very early stage of maturity, with most organisations finding themselves forced to either construct bespoke solutions for deployment or constrain themselves to highly data-science-specific tools that treat ML components as uncontrolled data sets.

At this point in the development of the practice, much of ML and AI research and development activity has been driven by Data Science rather than Computer Science teams. With the creation of the MLOps Roadmap, we hope to be able to draw upon the lessons of the past seventy years of managing software assets in commercial environments to accelerate the viability of managing ML assets in real world products.

There is a significant gap between the effort required to create a viable proof of concept of a trained ML model on a Data Scientist’s laptop vs what it subsequently takes to be able to safely transition that asset into a commercial product in production environments, and the lack of good process, experience and tooling to support that work means that currently the majority of ML experiments fail to make it into production.

Compounding this challenge, Machine Learning solutions tend to be decision-making systems rather than just data processing systems and thus will be required to be held accountable to much higher standards than those applied to the best quality software delivery projects. The bar for quality and governance processes is therefore very high, in many cases representing legal compliance processes mandated by regional legislation.

To meet these challenges, we need to fully understand the requirements inherent in this domain and have a clear picture of the processes and tooling necessary to facilitate good governance and solid asset management of products leveraging ML techniques.

Drivers for MLOps

Many of the existing DevOps principles apply equally to ML problems, including:

Optimising the process of taking ML features into production by reducing Lead Time
Optimising the feedback loop between production and development for ML assets
Unifying the release cycle for technology assets
Enabling automated testing of ML assets
Reducing Mean Time To Restore for ML applications
Reducing Change Fail Percentage for ML applications
Reducing overheads of IT management through economies of scale
Managing risk by aligning ML deliveries to appropriate governance processes

The MLOps problem space introduces some new challenges, however, such as:

Mitigating the risks associated with producing decision-making products
Incorporating ethical governance into management of ML assets
Enabling automated bias detection testing
Ensuring explainability of decisions
Ensuring fairness in decisions
Facilitating auditability of Training Data, Models and Test Sets.

The scope of Machine Learning spans far further than just moving a simple model from a Data Scientist’s laptop into the Cloud. Practical examples already include the need to be able to retrain models on a daily basis, utilising petabytes of training data, and then push these trained models to mobile phones, vehicles, machinery, wearables, and other highly specialised edge devices for real time inferencing.

The Roadmap Process

The MLOps Roadmap is in the process of collating a clear picture of all the fundamental challenges associated with the effective delivery of AI-focused products. For each challenge, the Roadmap will identify specific technology requirements that will be necessary in order to fundamentally address these challenges and will seek to propose potential solutions in each area. The intent is to provide an annual update with a five-year horizon, detailing the present capabilities in each challenge area and showing where future work is required to enable essential capabilities.

The intent is to facilitate open pre-competitive collaboration across the industry with a view to accelerating our shared capability to deliver high quality ML products and to enable us all to focus more of our efforts on the hard problems of creating true AI products in the coming years.

The Roadmap is managed within the CDF MLOps SIG, which is also home to a number of projects incubating specific implementations of technical challenges identified within the Roadmap, including Kubeflow pipelines on Tekton and the Jenkins-X MLOps extensions.

How to Join the Collaborative Effort

The current working draft document can be found at MLOps Roadmap 2020 and we welcome pull requests from anyone wishing to contribute to the Roadmap. Please feel free to join the MLOps SIG channel within the CDF Slack community or drop into the US or APAC/Oceania meetings and lend a hand!

The first release will be published later this year and there is still lots to work on.

Further information can be found here: https://github.com/cdfoundation/sig-mlops or contact Terry Cox via Slack or the mailing list for assistance.