APC 技術ブログ

株式会社エーピーコミュニケーションズの技術ブログです。

株式会社 エーピーコミュニケーションズの技術ブログです。

Building a Real-Time Model Monitoring Pipeline on Databricks

Building a Real-Time Model Monitoring Pipeline on Databricks: An Overview

​This is Johann from the Global Engineering Department of the GLB Division. I wrote an article summarizing the content of the session based on reports from Mr. Kanemaru participating in Data + AI SUMMIT2023 (DAIS).​

​ In this blog post, we will be discussing the content of a presentation titled "Building a Real-Time Model Monitoring Pipeline on Databricks." In this presentation, Alan Gubkin, CEO and co-founder of Aporia, and Anindya Saha, an MML engineer, emphasize the importance of model monitoring and provide methods for building better AI products and machine learning models. The target audience includes data scientists, machine learning engineers, and data engineers. ​ This blog post is divided into one part, and this is the first part. Let's dive into the content of the presentation! ​

Aporia's ML Platform and Delta Database Model Demo

​ First, Alan Gubkin, CEO of Aporia, demonstrated the company's ML platform and explained various aspects of it. He also discussed how to build a real-time water monitoring pipeline based on the Delta database model. ​

Overview of Aporia's ML Platform

​ Aporia's ML platform has the following features:

  1. Easy-to-use functionality for data preprocessing and feature engineering
  2. Ability to easily apply various machine learning algorithms
  3. Efficient model evaluation and selection capabilities
  4. Smooth model deployment and operation functionality

Delta Database Model and Real-Time Water Monitoring Pipeline

​ The Delta database model has the following features: ​

  1. Real-time data addition and updates
  2. Easy data version management
  3. High performance and scalability

​ In the real-time water monitoring pipeline, the Delta database model is utilized in the following processes: ​

  1. Collection and preprocessing of water quality data
  2. Application of machine learning models and predictions
  3. Visualization of prediction results and alert notifications ​

By building such a real-time water monitoring pipeline, it is possible to quickly detect abnormalities in water quality and take appropriate measures. ​

Latest Concepts, Features, and Services

​ In recent machine learning and data analysis fields, the following latest concepts, features, and services are attracting attention: ​

  1. Automated Machine Learning (AutoML): Technology that automates the construction and selection of machine learning models
  2. Model Monitoring: Technology that monitors the performance and data changes of machine learning models in real-time
  3. MLOps: Practices for efficiently developing and operating machine learning models

​ By utilizing these latest concepts, features, and services, it is possible to build better AI products and machine learning models. ​

Importance of Model Monitoring and How to Verify It

​ Model monitoring is essential for maintaining the quality of machine learning models. Let's discuss the importance of model monitoring and how to ensure accuracy, no bias, and no hallucinations. ​

Importance of Model Monitoring

​ Model monitoring is important for the following reasons: ​

  1. To maintain model performance
  2. To respond to data changes
  3. To detect model bias and hallucinations

How to Verify Model Monitoring

​ When performing model monitoring, it is important to check the following points: ​

  1. Save the model's input and output
  2. Calculate metrics in the inference table
  3. Select performance evaluation metrics for the model
  4. Regularly check the model's performance

Basic Model Monitoring Pipeline

​ The basic model monitoring pipeline involves saving the model's input and output and calculating metrics in the inference table. This helps maintain model performance and respond to data changes. ​

Saving the Model's Input and Output

​ Saving the model's input and output has the following advantages: ​

  1. Data for evaluating model performance is obtained
  2. Information useful for improving the model is obtained
  3. It becomes easier to identify model problems

Calculating Metrics in the Inference Table

​ Calculating metrics in the inference table has the following advantages: ​

  1. Model performance can be quantitatively evaluated
  2. It becomes easier to identify model improvement points
  3. Model problems can be detected early

Building a Real-Time Model Monitoring Pipeline Using Databricks

​ By utilizing Databricks, it is possible to build a pipeline for real-time model monitoring. This helps maintain model performance and quickly respond to data changes. ​

Advantages of Databricks

​ Using Databricks has the following advantages: ​

  1. Real-time model monitoring is possible
  2. Efficient processing of large amounts of data
  3. Flexible scaling

How to Build a Real-Time Model Monitoring Pipeline

​ The process of building a real-time model monitoring pipeline using Databricks is as follows: ​

  1. Save the model's input and output on Databricks
  2. Create an inference table and calculate metrics
  3. Check the model's performance in real-time

​ Model monitoring is an essential element for maintaining the quality of machine learning models. By using Databricks to perform real-time model monitoring, it is possible to maintain model performance and quickly respond to data changes. ​

Summary

​ In this presentation, the importance of model monitoring and how to use the insights gained through it to improve AI products and machine learning models were explained. Model monitoring is an essential element for maintaining and improving the performance of machine learning models. By considering various factors such as data drift and metric calculations, and utilizing Databricks to build a real-time model monitoring pipeline, better AI products and machine learning models can be realized.

Conclusion

This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.

Translated by Johann

www.ap-com.co.jp

Thank you for your continued support!