APC 技術ブログ

株式会社エーピーコミュニケーションズの技術ブログです。

株式会社 エーピーコミュニケーションズの技術ブログです。

Learn How to Reliably Monitor Your Data and Model Quality in the Lakehouse

Introduction

This Chen from GLB Business Department Lakehouse Department. I watched the Data + AI SUMMIT2023 (DAIS) webcast session "Learn How to Reliably Monitor Your Data and Model Quality in the Lakehouse". I will tell you the contents of this session.

This session was with databricks Alkis Polyzotis (Technical Lead, Meachine Learning) and Kasey Uhlenhuth (Staff Product Manager). The talk introduced Lakehouse Monitoring, which was developed to solve the problem of data engineering and data science teams using different monitoring tools. This new platform aims to enable a more efficient and effective Machine Learning (ML) lifecycle by unifying data quality.

Enabling an Efficient ML Lifecycle with Integrated Data Quality

The purpose of Lakehouse Monitoring is to integrate data quality into the platform to achieve the following effects:

  • Improved communication between data engineering and data science teams

  • Improve the accuracy of machine learning models by improving data quality

  • Improve development efficiency by early detection and response to data quality issues

These streamline the entire machine learning lifecycle and help companies create value faster.

Specific Features of Lakehouse Monitoring

Lakehouse Monitoring offers the following features:

  • Integrated data quality: Both data engineering and data science teams can share the same data quality standards

  • Data Quality Visualization: See data quality issues at a glance on your dashboard

  • Data Quality Alerts: Stakeholders can be notified when data quality falls below thresholds

These features enable early detection of data quality issues and rapid response.

Take advantage of the latest concepts and features

Lakehouse Monitoring leverages the latest concepts and features. Examples include:

  • Automatic data quality assessment: Provides the ability to automatically assess data quality using machine learning algorithms

  • Data quality improvement suggestions: If a data quality problem is found, there is a function to suggest the cause and improvement measures

These make responding to data quality issues more efficient.

Summary

Lakehouse Monitoring is a platform developed to solve the problem of data engineering and data science teams using different monitoring tools. Integrating data quality streamlines the entire machine learning lifecycle, enabling companies to create value faster. It also leverages the latest concepts and features, and is expected to be more efficient in responding to data quality issues.

Conclusion

This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.

Translated by Johann

www.ap-com.co.jp

Thank you for your continued support!