Machine Learning Lifecycle Maturity
Introduction
Companies, large and small, across multiple industries, are all attempting to build Machine Learning (ML) models to gain competitive advantages. However, ML models are much harder to consistently deploy to production than traditional software. According to this report by Algorithmia, 2020 State of Enterprise Report, only 22% of the surveyed companies have successfully deployed a model, and in spite of increased spending, 43% had difficulty scaling machine learning projects to their company’s needs.
Starting on their ML journeys, companies often tend to focus initially on building a data science team for developing sophisticated ML models to extract critical business insights from data. However, lasting business value, and ROI, come from getting the solution to the next stage: validated, scaled, and deployed to production, monitored and optimized post-deployment. It is in these stages of the lifecycle most companies have difficulty.
Companies wishing to take advantage of the tremendous gains that are happening in the field of ML should encourage rapid adoption of MLOps—a set of principles, and practices for a rapid ML lifecycle.
Machine Learning Maturity Journey
In traditional software engineering, a set of principles and practices collectively called DevOps allows rapid, repetitive, and reliable releases, and unites development, deployment, and operational monitoring. These practices include code versioning, incremental releases, testing automation, integration and delivery, and operational monitoring. DevOps has been successfully employed in SaaS businesses and is considered a best practice for several years.
A similar set of practices for ML deployment would enable organizations to:
Quickly deploy and test new models.
Compare impacts of changes to models.
Reproduce and retrace important model-driven business decisions.
Rapidly rollback to earlier models if issues are found in a recent deployment.
Retrain models when new data sets become available.
Address questions surrounding model security, privacy, and bias.
An ML system can comprise a large and complex infrastructure, with cross-functional responsibilities spanning multiple teams, as shown in this diagram (Adapted from this Google paper: MLOps: Continuous delivery and automation pipelines in machine learning):
Organizations will have to iteratively evolve and mature the stages of the ML lifecycle based on the nature of the business problem being addressed, with more and more automation achieved across the stages, like the one shown below.
The situations applicable to each stage of the maturity journey are:
MLOps Level 0
For organizations starting on their ML journey, requiring infrequent deployment of models to production, may be only a few times in a year. At this stage, these organizations will have a small team of data scientists working individually on small datasets which are slow to change, manually deploying the models to production.
MLOps Level 1
For organizations with some experience in applying ML to business problems, requiring models to be trained and deployed more frequently, possibly monthly. At this level of maturity, the ML pipeline stage is automated while the testing and deployment of the models can still be manual. For such organizations, data scientists focus on model development, while the automation is handled by a separate team.
MLOps Level 2
For organizations dealing with rapidly changing web-scale datasets, daily if not hourly, it becomes necessary to be at this level, to allow for continuous retraining and deployment of ML models to multiple nodes or servers. At Level 2 maturity, the ML lifecycle activities will be spread across multiple teams responsible for data engineering, model experimentation and development, deployment, and monitoring.
Tools for MLOps
There is a wide choice of tools and technologies available for enabling MLOps in enterprise and software companies. All major public cloud providers offer a wide coverage of MLOps features which may be sufficient for most organizations—AWS SageMaker from Amazon AWS, Azure ML from Microsoft Azure, and Google AI Platform from Google Cloud.
There are numerous cloud-agnostic MLOps tools as well, ranging from fully free and open-source to subscription based. A few of the well-known are: Kubeflow, DVC, Apache Airflow, MLflow, and Seldon. Most organizations will need a combination of different tools for end-to-end lifecycle automation.
Conclusion
For organizations to advance their ML capabilities and enable agility, reproducibility, auditability, and maintainability of their ML models, it is becoming increasingly necessary to incorporate MLOps practices. Thinking in terms of MLOps maturity levels can help assess the current state of the practice and create a roadmap for alignment with the strategic objectives of the business.