A cross-industry approach to shorten time-to-value for AI/ML use-case deployment (part 3/3)
Abstract (repeated from part 2/3 link)
In this digital age of hyper competition and collaboration, AI/machine-learning enabled use cases hold the key to continuous & disruptive innovation in the post-CoronaVirus new norm business environment. Every industry has the same obstacle-course to go through. After developing the initial experimental use-case, the enterprise needs an agile platform to deploy the model in their production environment in an accelerated manner. Sometimes data scientists and analysts take a lot of time in gathering data and utilizing it in their advanced analytics activities. There is a ‘need for speed’ and ‘efficiency’ of ML operational deployment. Currently there are good options such as GCP & Databricks cloud environments to deploy machine learning enabled use cases in a matter of hours. This document outlines one of the automated approaches for enterprises to deploy machine learning enabled use cases using rapid deployment GCP and Databricks platforms.
Part 3 of 3:
In the last 2 blogs, we covered data onboarding, ingestion activities and the steps of feature engineering & model building + export. In this last section, we cover the final few steps and conclusion.
C. Prediction Model Deployment
Once the model is created and tested using any toolkit, it should be deployed in production in an efficient manner. There are several ways to deploy the model in production, such as writing customer script using any of the languages of choice such as python, Scala, java etc. It can be deployed using any of the frameworks as well such as DB ML Flow, AWS ML, Google Cloud ML, TensorFlow, H2O, etc. There are several platforms which can help with deploying the models.
Figure 2 : ML Use-cases leveraging current Data domains (for training, modeling, commercial devOps)
Key point post-deployment is to address concept/model drift and model decay. Refer to Jason Brownlee's machine learning mastery site to learn more about this. To address concept and model drift, it is essential to monitor and continuously address model drift to ensure the output and model accuracy is maintained in accordance with original business goals. Business goals should be revalidated on periodic basis to ensure model decay is addressed by aligning with business goals repeatedly. In general, the platforms with built-in fictionally for data science are superior to platforms which need additional setup or configuration for data science integration, hence it would be advisable to ensure the platform has some kind of data science integration available, before standardizing the platform across various organizations.
Flexibility of model configuration is important to ensure data engineers and scientists get enough ‘play’ area to work on various parameters including feature engineering of the model.
figure 2: Example of a In-line Custom Transformation flow for ML use-cases in Infoworks.io DataFoundry
Of course, there is plenty of features and parameters that a machine learning toolkit can leverage as this field is continuously evolving. It is hard for any platform to comprehensively include all the functionality of machine learning toolkit.
Other consideration is the reusability of the developed code throughout the various parts of the platform. For example if you have engineered few features for a decision tree for your customer segmentation for marketing, then similar code should allow customer segmentation for collection department as well.
For learning about ML-Ops vs. DevOps, refer to this great knowledge article by Google Cloud team, talking about : 'Continuous delivery of models: An ML pipeline in production continuously delivers prediction services to new models that are trained on new data. The model deployment step, which serves the trained and validated model as a prediction service for online predictions, is automated.'
figure 3: from Google Cloud article: Example of ML pipeline automation for Continuous Training (with CI/CD)
The following diagram shows the 'maturity' stages of the ML CI/CD automation pipeline for you consideration as you progress in your ML-Ops journey ahead:
figure 4: from GCP knowledge base: Stages of the CI/CD automated ML pipeline
Note: Some platforms like Infoworks’ DataFoundry allows you to insert ML logic in-line with their custom data transformation flows. This allows you to deploy the model in a more scalable and commercial format once finalized. It can become part of your overarching DEVOPS and ML-Ops methodology to ensure CI-CD and CT.
Conclusion
As can be seen from this article, there are several considerations to be made by data scientists and data engineers to create machine learning, artificial intelligence pipelines in an agile manner leveraging many features available on advanced platforms. Teams can either leverage their platforms’ built-in functionality or extend it using connectors, transformations to speed up the process.
Since we worked closely with Infoworks products, we recommend leveraging Infoworks ‘Data Onboarding’ capabilities to ingest the data, so that it can be available to data scientist to build the model out more efficiently and in a Fail-fast-fail-cheaply (FFFC) manner.
The details of onboarding (ingestion process and parameters) are available on the Infoworks Documentation website. Similar details can be procured from the Databricks community, GCP community and related automation tool sites. The savings are in terms of reducing manual ingestion, both initial and incremental combined with ease of deployment of advanced machine learning algorithms. The time savings it self should be substantial and help create a strong business case in terms of Total Cost of Ownership (or TCO for commercial operations of AI/ML). Keeping your enterprise’s business priorities and timeline in mind, the holistic platform should stand out in terms of flexibility, ease of use and time-to-value for many data science use cases.
Photo by Razvan Chisu on Unsplash
----
Authors : VirooPax B. Mirji and Ganesh Walavalkar
p.s. These are personal views and should not be considered as representing any specific company or ecosystem of partners.
-------
Additional reading materials on Medium: Ashok Chilakapati's article on Concept Drift and Model Decay.
On Google Cloud: MLOps: Continuous delivery and automation pipelines in machine learning