How to build a state of the art Machine Learning platform in 2021?
Traditionally, machine learning has been a domain reserved for academics who have the mathematical skills to develop complex ML models but lack the software engineering skills required to produce these models.
It may come as a surprise to young data scientists to know that only a tiny part of the code of many ML systems is actually dedicated to learning or prediction.
In order to build good machine learning products, it’s often a matter of making machine learning the great engineer that you are, rather than the great machine learning expert that you are not.
But then, what are the essential components of a production learning machine system according to my degree of maturity?
A machine learning system is a workflow that can be broken down into six steps:
- Managing data
- Training the models
- Evaluate the models
- Deploy models
- Making predictions
- Monitor predictions
This workflow is agnostic to implementation, ie to the types of learning algorithms and deployment modes (online and offline prediction). It must be built with tools that are scalable, reliable, reproducible, easy to use, interoperable and automated.
Usually the maturity of an organization in machine learning is measured in terms of:
- Type of data (structured, unstructured, online)
- Degree of automation of the learning machine pipeline
- Size of models and teams
AI companies like Uber (Michaelangelo), Google, Airbnb (Bighead) and Facebook (FBlearner Flow) all have platforms that allow to solve the above mentioned problems: having multiple teams that develop and deploy extremely fast huge models ingesting structured and unstructured data, online and offline,
But we don’t all have the same type of resources as these big players. Having said that, let’s see how we could set up our own ML industrialization system.
It should be thought of as an assembly of state of the art, scalable and interoperable bricks, offering standardized APIs to avoid as much as possible the use of glue code which is costly in the long term because it tends to freeze a system with the peculiarities of a package.
Phase 1: Set up fondations
In the first phase of the life cycle of a machine learning system, the important part is to get the training data into the learning system, create a service infrastructure to run the models and implement a measure of the quality of the predictions.
We have found that building and managing data pipelines is usually one of the most expensive components of a complete machine learning solution. A platform needs to provide standard tools for building data pipelines to generate feature and label data sets for training (and re-training) and feature data sets for prediction only.
These tools need to be deeply integrated into the data lake or enterprise data warehouses and enterprise online data server systems. They must be scalable and high-performance, with integrated monitoring of data flow and data quality.
They must also provide strong safeguards and controls to encourage and accustom users to adopt best practices (for example, easily ensuring that the same data generation/preparation process is used both at training and forecasting time).
To address these issues Kili was designed as a one-stop shop for storage, annotation, quality monitoring and consumption of unstructured data. The main features are the following:
- Scalable data storage infrastructure (S3 bucket for assets, PostgreSQL for metadata, all on Kubernetes)
- Data annotation/labelling interface
- Label Quality Management Workflow
- Versioning of annotations
- Centralization of annotation plans to standardize meta-data across the organization
- Fine management of access rights
- Powerful search engine through data and meta data
- API allowing import and export to standard data formats (fight glue code)
Serving the models
The most common way to deploy a trained model is to save it in the binary format of the tool of your choice, wrap it in a microservice (e.g. a Python Flask application) and use it for inference. Managed solutions simplify this deployment process and provide tools to perform alpha versions and A/B testing, this approach is known as “Model as Code”. However, this approach has several drawbacks, as the number of models increases, the number of multiple micro-services and thus the number of failure points, latency, etc. making it really difficult to manage. Managed solutions have been introduced to mitigate this problem. Some of the model service-oriented tools are Seldon, Clipper or Tensorflow Serving.
Another more recent approach is to standardize the format of the model so that it can be used by programming using any programming language and does not need to be wrapped in a microservice. This is particularly useful for data processing workflows where latency and error handling are a problem. Since we call the model directly, we don’t have to worry about control, error management, etc. This approach is known as “model as data”. Currently Tensorflow has become the de facto standard, the new SavedModel format contains a complete TensorFlow program including weights and calculation. It does not require execution of the original model building code, making it useful for sharing.
When a model is formed and evaluated, historical data is always used. To ensure that a model works in the future, it is essential to monitor its predictions to ensure that the data pipelines continue to send accurate data and that the production environment has not changed so that the model is no longer accurate.
This can be addressed by automatically recording and possibly retraining a percentage of the predictions made, and then later attaching these predictions to the observed results (or labels) generated by the data pipeline.
With this information, we can generate continuous, real-time measurements of the model’s accuracy.
Prometheus is a free software for computer monitoring and alert generation. It records real-time metrics in a time-series database based on exposed entry point content using the HTTP protocol.
When it comes to unstructured data, Kili provides the additional components to monitor the prediction of a model directly on the unstructured asset.
Phase 2: Automate
In some use cases, the manual process of training, validation and deployment of ML models may be sufficient. This manual approach works if your team only manages a few ML models that are not re-trained or modified frequently. However, in practice, models often break down when deployed in production because they fail to adapt to changes in environmental dynamics or the data describing those dynamics.
To integrate an ML system into a production environment, you need to orchestrate the steps in your ML pipeline. In addition, you need to automate the execution of the pipeline for the continuous training of your models. To test new ideas and features, you need to adopt CI/CD practices in new pipeline implementations:
- Configure a continuous delivery system to frequently deploy new implementations across the entire ML pipeline.
- Automate the execution of the ML pipeline to re-train new models with new data and capture emerging trends.
Continuous integration / development (CI/CD)
To quickly deploy new ML pipelines, you need to configure a CI/CD pipeline. This pipeline is responsible for the automatic deployment of new ML pipelines and components when new implementations are available and approved for different environments (development, testing, staging and production).
An ML platform must have end-to-end support to manage model deployment via the user interface or API. Ideally there will be two modes in which a model can be deployed:
- On-line deployment. The model is deployed in an online prediction services cluster (containing several machines behind a load balancer) where customers can send prediction requests.
- Offline deployment. The model is deployed in an offline container that can be run in a Spark job or integrated as a library in another service and invoked via API.
For this we can use Kubeflow which has been specifically designed to simplify the design and deployment of ML applications by grouping all the steps of an ML process into a self-contained pipeline that functions as a Docker container at the top of the Kubernetes orchestration layer. By relying heavily on Kubernetes, Kubeflow can provide a higher level of abstraction for ML pipelines, freeing up the resources of human data science for higher value-added activities.
Continuous training (CT)
Most machine learning models operate in the complex and ever-changing environment of things in motion in the physical world. To keep our models accurate as this environment changes, our models must evolve with it. Teams need to re-train their models on a regular basis. A complete platform solution for this use case involves easily updatable model types, faster training and assessment architecture and pipelines, automated model validation and deployment, and sophisticated monitoring and alerting systems.
The availability of new data is one of the triggers for ML model re-training. The availability of a new implementation of the ML pipeline (including the availability of a new model architecture, feature extraction, and new hyperparameters) is another important trigger for the re-execution of the ML pipeline.
To connect the different system components and implement your CI/CD and CT, you need an orchestrator. The orchestrator executes the pipeline as a sequence and automatically switches from one step to the next according to the defined conditions. For example, a defined condition can be the execution of the model diffusion step after the model evaluation step as soon as the evaluation metrics reach the predefined thresholds. The orchestration of the ML pipeline is useful in the development and production phases :
- During the development phase, orchestration allows data scientists to perform ML tests in an automated manner instead of performing each step manually.
- During the production phase, orchestration allows the execution of the ML pipeline to be automated according to a schedule or certain trigger conditions.
- Apache Beam is mainly used for distributed data processing in some TFX components. Therefore, Apache Beam is required with all orchestrators you choose (even if you don’t use Apache Beam as an orchestrator!).
- Airflow is a task management system. DAG nodes are tasks and Airflow makes sure that they are executed in the right order, making sure that a task does not start until the dependency tasks are completed.
- The design of Kubeflow is based on the concept of a machine learning pipeline. An ML pipeline includes all the steps that are included in a given scientific workflow. It can start by obtaining data from a local or remote source such as Kili, executing a certain transformation on the data, loading it into an ML model running on a Notebook, and then initiating the learning of this model on a larger cluster, using one or more data sets.
Phase 3: Scale
As teams grow, teams end up defining features differently and it is not easy to access feature documentation. There is also the need to be able to train models that are increasingly complex to train and analyze.
It becomes important to add new bricks to our Machine Learning system: feature store, model visualization and distributed learning.
A feature store that allows teams to share, discover and use a set of very elaborate functionalities for their machine learning problems. On many projects we have found that many modeling problems use the same or similar features, and that it is very useful to allow teams to share features between their own projects and teams from different organizations to share features with each other team.
This allows for improvement:
- Feature discovery and reuse: Teams can then use features developed by other teams, and as features are added to the store, it becomes easier and cheaper to build models.
- Quick access to features for training: Data scientists can focus more on modeling and less on feature engineering.
- Feature standardization: Teams are able to capture documentation, metadata, and metrics for features. This allows teams to communicate clearly about features.
- Consistency between training and service by unifying data ingest from batch and streaming processing sources.
Understanding and debugging models is becoming increasingly important, especially for deep learning. As machine learning has evolved towards widespread adoption, the challenge is to build models that users can understand. This can be easily observed in high-risk areas, on applications such as healthcare, finance and the judiciary. Interpretability is also important in general applied machine learning issues such as model debugging, regulatory compliance and human-machine interaction.
Although we have taken some important first steps with tree-based model visualization tools, much more needs to be done to enable scientists to understand, debug and tune their models and to give users confidence in the results.
InterpretML is an open-source Python package that exposes machine learning interpretability algorithms.
A growing number of machine learning systems are implementing deep learning technologies. Cases using deep learning typically process a larger amount of data, and require different hardware (i.e. GPUs). This motivates additional investments for closer integration with a cluster.
Parallel distribution of deep neural networks (DNN) training data using multiple GPUs on multiple machines is often the right answer to this problem.
The main challenge of distributed DNN training is that the gradients computed during back-propagation on multiple GPUs must be fully reduced (averaged) in a synchronized step before the gradients are applied to update the model weights on multiple GPUs across multiple nodes. The synchronized total reduction algorithm must be highly efficient, otherwise any training acceleration achieved by parallel distributed training with the data would be lost due to the inefficiency of the synchronized total reduction step.
Uber’s open-source Horovod library has been developed to meet these challenges:
- Horovod offers a choice of highly efficient and fully scaled-down synchronized algorithms that adapt to a growing number of GPUs and nodes.
- The Horovod library is based on the communication primitives of the Nvidia Collective Communications Library (NCCL) which exploit the knowledge of the Nvidia GPU topology.
- Horovod is supported by many machine learning frameworks, including TensorFlow.
To summarize, here are the bricks that we believe are essential to build a state of the art ML platform. In order:
- A training data platform (e.g. Kili)
- A platform for hosting model containers (e.g. Kubernetes)
- A platform for monitoring predictions (eg. Kili and Prometheus)
- A CI/CD and CT platform (eg. Kubeflow)
- An orchestrator (e.g. Kubeflow or Airflow)
- A feature store (eg. Feast)
- A model visualization framework (e.g. Interpret)
- A learning distribution framework (e.g. Horovod)
These are just a few of the things you need to worry about when building a production ML system. Others include recording and monitoring the status of the various services (logging). There are many tools such as Isito, which can be used to secure and monitor your system.