The need for robust machine learning applications has given rise to the field of Machine Learning Operations, MLOps. Defining MLOps with a simple definition is challenging. Even today, there is still a wide variety of definitions among experts to the question, “What is MLOps?”
What is MLOps, and Why Does it Matter?
At phData, we define MLOps as the operationalization machine learning models for the purpose of extracting business value. DevOps has revealed the huge business value that can be achieved with the rapid and automated deployment and monitoring of software.
MLOps while in its infancy is on a similar trajectory as DevOps and is becoming increasingly important for extracting value from your data and machine learning models. Extracting business value from your machine learning models with MLOps is not easy.
What is the Goal of MLOps?
Gartner predicted that in 2020, 80 percent of AI projects would remain alchemy, run by wizards whose talents will not scale in the organization and that only 20 percent of analytical insights will deliver business outcomes by 2022. Rackspace corroborates that claim in a survey completed in January of 2021 saying that 80 percent of companies are still exploring or struggling to deploy ML models.
Why is it so hard to convert the insights discovered by data scientists into tangible value for the business? Let’s assume a data scientist discovered a model with tremendous business value. If that model is difficult to use, hard to understand, and computationally expensive, then extracting the business value may not be possible.
The goal of MLOps is to extract business value from data by efficiently operationalizing ML models at scale.
Many organizations are employing a new role of ML engineer to deliver MLOps success. While a data scientist may discover a model with business value, deploying that model into production requires an entirely different set of skills. An ML engineer builds ML pipelines that can reproduce the results of the models discovered by the data scientist automatically, inexpensively, reliably, and at scale.
At phData we believe that the agile approach spearheaded by DevOps also applies to MLOps. ML projects, just like DevOps projects, provide the most value when they are delivered in short iterative cycles in partnership with our customers.
If you’re looking for a more complete guide to ML model deployment, check out our Ultimate MLOps Guide.
4 Tell-Tale Signs You Need MLOps
At phData, we think of MLOps as an opinionated, automated assembly line for delivering ML models. The idea is to maximize automation, improve communication and observability, and ultimately get more reliable results.
Here are four tell-tale signs you need MLOps.
#1 - Lack of Consistency and Reusability
Does your research and experimentation feel like a black box that sometimes works and sometimes doesn’t? Or are you concerned that the original data scientist is the only person capable of maintaining your model? What happens if they get promoted or no longer work for your organization?
These are a few examples we’ve seen customers struggle with that makes it impossible to have a consistent, repeatable process for developing models.
That’s because an ML model is a combination of many things: code (e.g. data prep scripts, training scripts, scripts to drive inference using the model), algorithm choices, hyperparameters, and data. With ML, data is a key part of the product, and understanding the path of exploration is important.
But more often than not, this path is captured in isolation, in personal notebooks, datasets stored on personal computers, or cloud instances. Given the high inertia of the data and the variety of other inputs into the process, individuals’ personal work habits often drive the storage and organization of everything.
This introduces new questions and problems to solve. Data scientists need to be able to share and expose their work efficiently. Models need to be made available to the business, where they ultimately deliver value.
PRO TIP: Standardized processes mean that data scientists will no longer need to focus energy on setting up code environments, structuring experiments, or other routine tasks that do not align with their areas of expertise.
One easy way to get started is to create a project template folder that you can duplicate each time a new project is started. It can contain a directory to hold data prep scripts, a directory for data exploration scripts, etc., so that new projects all share a common layout.
#2 - Need for Traceability and Governance
Most organizations have rules related to data. Most of the time this comes down to things like how it is stored, how to control access to the data, how long data must be kept, and when data must be destroyed. But what might not be so obvious is that, as a product of data, ML models need similar care.
We’ve worked with plenty of companies who weren’t taking steps to ensure traceability. Because of this, every model their team built felt like an island. They spent a lot of time and energy tracking down how each one was built – especially when they needed to change a data source or field destination.
Without that background, these changes were risky because they were relying on nothing but their data scientist’s assurances that the model wasn’t built with sensitive or inappropriate data.
These are some additional challenges our customers weren’t prepared to solve before implementing an MLOps strategy:
- When data becomes irrelevant or is destroyed, what does this mean for the models trained from this data?
- Was sensitive data used to train a model? What does this mean for how the model can be used?
- A model is producing “strange” predictions. How can you discover the environment in which it was trained and the data used to do so?
If your organization is struggling to answer these key questions of traceability and governance, odds are you need a more automated approach that takes them into account.
PRO TIP: In addition to strong MLOps practices, tools for tracking the lineage of data and lifecycle of ML models will help you achieve this goal. Apache Atlas can be difficult to master but is very effective for data lineage tracking. MLflow provides a comprehensive framework for model and experiment tracking.
We’ve seen our customers use a wide variety of tools to get the job done, so you should look around to see what might be a good fit for your process.
#3 - Inadequate Reliability
When models are built from the ground up every time, deploying the tenth model will be just as time-consuming as deploying the first. This can also make it difficult to rely on the model’s availability enough to make it part of critical business processes.
And, it’s incredibly risky and time-consuming to retrain your model because of reliability issues. This is why it’s so important to design reliability into your models from the start.
In order to do so, you need to ensure you are meeting all of the typical expectations of business-critical software:
Your model must be highly available. This is especially critical if the model is being used in real-time.
Periodic re-training of models must be automated.
The quality of the data used in automated retraining must be automatically screened for quality and drift.
Don’t assume that your data transformation pipelines will flawlessly run indefinitely, or that the source data will continue to be valid. One large financial services company learned this the hard way when field agents decided one day that the easiest way to flag a transaction for follow-up was to enter “-1” in the field used for total value.
Without automated quality checks, problems like this can propagate all the way to the machine learning models and lead to corrupted business processes.
Your models must be tested to ensure that newly-trained models pass quality checks before they are made available for production use.
Without automated quality checks, there’s a risk that a simple development error can take down an otherwise functioning service or worse – replace it with a model that makes erroneous predictions.
MLOps combines DevOps with data science know-how to ensure you are delivering high-quality, validated, highly-available data products for production use.
PRO TIP: In order to meet these expectations, you’ll need to have a robust infrastructure that allows for nodes to fail (such as Kubernetes or Amazon’s EKS), a job orchestration system that can schedule maintenance tasks (we’re big fans of Airflow), and you’ll need to borrow some techniques from DevOps (like CI/CD) to handle the automated quality checks and model tests.
#4 - Poor Observability
If it’s not immediately clear how much a model is being used or what it’s predicting, it can easily end up causing more work than it’s saving. Companies who aren’t able to determine how a specific prediction was made (or it takes significant effort to test models and ensure that they’re still relevant), should be looking at standardizing processes with MLOps.
Production machine learning models must be monitored like any other software because it’s impossible to escape the dependence of your models on data. Not only should you be aware of the data used to train your models, you should also monitor the data presented to your model at runtime.
To get a better understanding of how observable your models are, ask yourself these questions:
- Is the form of your runtime data drifting in a way that your model is failing to produce useful predictions?
- Is the meaning of various features in your data changing over time? Will seasonal changes or demographic shifts make your predictions less relevant?
- Is your model producing predictions with lower confidence?
If you answered yes to any of these (or even worse – if you have no way of knowing whether or not this is occurring), that means you may not be monitoring your models as closely as you could be and it’s a good time to consider an MLOps framework for your business.
PRO TIP: A comprehensive solution is required to improve observability, but there are some simple ways that you can get started.
By assigning a UUID to all model requests and logging input and output data, you can begin to measure changes in the distributions of inputs and outputs. Then, to determine how much drift has occurred between the training data and the data being seen in production, build a classification model that takes an input record and classifies it as being from the training set or the production set. If a model can do this accurately then you know that there’s something different between the two patterns.
Conclusion
When done right, MLOps enables a much more systematic and sophisticated approach to ML. If your organization is struggling to automate time-consuming steps, keep teams on the same page, and ensure consistency and transparency around model delivery, then it’s probably time to take the need for MLOps more seriously.
Now that you have an introduction to MLOps, take the next step by reading part two of our MLOps series titled, A Beginner’s Guide to MLOps: Deploying Machine Learning Into Production.