Introduction
I'm Sasaki from the Global Engineering Department of the GLB Division. I wrote an article summarizing the content of the session based on the report by Kanamaru, who is participating in Data + AI SUMMIT 2023 (DAIS) on site.
This time, I'll be covering the talk "Real-Time ML in Marketplace at Lyft", which explains the importance of real-time machine learning infrastructure and how to build it. This talk discusses the importance of real-time machine learning infrastructure and real-time system use cases such as supply and demand forecasting and fraud detection. The target audience is data scientists, ML engineers, data engineers, business leaders, etc.
Importance of real-time machine learning infrastructure
Real-time machine learning infrastructure is critical in supply and demand forecasting and fraud detection. The infrastructure consists of three core elements: features, model training, model inference or serving. Real-time infrastructure focuses on generating features in real-time.
Building blocks of a real-time machine learning infrastructure
A real-time machine learning infrastructure consists of three core elements:
- Features
- Model training
- Model inference or serving
Especially in real-time infrastructure, the focus is on generating features in real-time. This enables real-time system use cases such as supply and demand forecasting and fraud detection.
Use cases for real-time machine learning infrastructure
Real-time machine learning infrastructure is leveraged for use cases such as:
- Demand and supply forecast
- Fraud detection
Demand and supply forecasting can generate features in real-time, enabling the provision of services at the right time. Fraud detection also detects fraudulent transactions in real time to minimize damage.
How to build a real-time infrastructure
There was a talk that detailed how to build a real-time machine learning infrastructure. In it, it was explained that two pipelines are required: an ingestion pipeline and an aggregation pipeline.
Model Training Pipeline
A specialized model training pipeline was introduced for models that require real-time training. Training starts when data dependencies are satisfied by smart triggers. It was also emphasized that the model inference pipeline also works in real time.
The Importance of Synchronous Model Inference
It was explained that synchronous model inference is important for use cases such as deciding which offer to give to a passenger. DAGs (Directed Acyclic Graphs) are a key component of pricing models and are leveraged to achieve more accurate and efficient dynamic pricing.
Components of the ecosystem
The real-time infrastructure ecosystem includes the following elements:
- Kinesis source
- Feature pipeline
- Model execution pipeline
- Downstream services
- Data visualization layer
These elements work together to help enable real-time machine learning.
Real-time configuration management and DevEx
It was explained that real-time configuration management is important for some use cases and DevEx (Developer Experience) is a key factor in building infrastructure. The advantage of using YAML configuration files is that they can be easily created by ML engineers and data science professionals. Through this talk, I was able to deepen my understanding of how to build a real-time machine learning infrastructure and its importance. We will continue to keep you up to date on the latest concepts, features, and services.
Conclusion
This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.
https://www.ap-com.co.jp/data_ai_summit-2023/
Thank you for your continued support!
Translated by Johann