APC 技術ブログ

株式会社エーピーコミュニケーションズの技術ブログです。

株式会社 エーピーコミュニケーションズの技術ブログです。

A Technical Deep Dive into Unity Catalog's Practitioner Playbook Part 2/3

Leveraging Unity Catalog: A New Approach to Data and AI Governance

​This is Johann from the Global Engineering Department of the GLB Division.

​ In this post, we will delve into an intriguing presentation on data and AI governance. The talk, titled "A Technical Deep Dive into Unity Catalog's Practitioner Playbook," was delivered by Ishan Pappa, an experienced product leader at Databricks, and Ifigeneia Derekli, a data technology expert. They provided a detailed explanation of data and AI governance using Unity Catalog. ​ This blog post is the second part of a three-part series. In the previous post, we discussed how Unity Catalog serves as a governance layer for data and AI, its role within the Lakehouse platform, and its relationship with cloud providers. In this post, we will take a closer look at how to register data in Unity Catalog, its security features, and its data search and lineup capabilities. ​

Metastore and Unity Catalog

​ Unity Catalog introduces the concept of a metastore, a management data source for storing and managing tables. Understanding the relationship between Unity Catalog and the metastore can make data management more efficient. ​ - Metastore: A data source for storing and managing tables - Unity Catalog: A tool that leverages the metastore for data and AI governance ​

Characteristics and Advantages of Unity Volumes

​ Unity volumes are designed for non-tabular data sets. This allows for the centralized management of data in various formats. Understanding the characteristics and advantages of Unity volumes can make data management more flexible. ​ - Unity Volumes: A feature designed for non-tabular data sets - Characteristics and Advantages: Centralized management of data in various formats ​

Access Control Features of Data Management Systems

​ Unity Catalog also provides a detailed explanation of the access control features of data management systems. This allows for the secure management of data while granting appropriate access rights to necessary users. ​ - Access Control Features: Ensuring data security and granting appropriate access rights ​ The use of Unity Catalog has shown that data and AI governance can be performed more efficiently. Understanding and properly utilizing these features can enhance data management efficiency and security. ​

Data and AI Governance with Unity Catalog

​ Unity Catalog is a tool that provides powerful lineage, search, and audit capabilities for data management. Understanding its characteristics and advantages allows for the maximum utilization of its data search and lineup features. ​

Creation of CI/CD Pipelines and Access Control

​ First, let's discuss the advantages of creating CI/CD pipelines using Unity Catalog. CI/CD pipelines, short for Continuous Integration and Continuous Delivery, refer to the automation of the entire process from development to deployment. ​ Using Unity Catalog offers the following advantages: ​ 1. Establish accurate permissions and access control 2. Maintain data consistency 3. Reflect data updates and changes in real-time ​ These features make data management easier and improve data reliability. ​

Enhanced Data Management

​ Next, let's discuss the powerful lineage, search, and audit capabilities of Unity Catalog in data management. Adopting Unity Catalog offers the following characteristics and advantages: ​ 1. Clearly understand data lineage 2. Quickly search for necessary data 3. Audit data usage ​ These features further strengthen data management and significantly improve data usage efficiency. ​ By leveraging Unity Catalog, data and AI governance can be performed efficiently. Let's maximize the use of data search and lineup features and further strengthen data management. ​

New Possibilities for Data Sharing: Delta Sharing

​ The presentation provided a detailed explanation of data and AI governance using Unity Catalog. Of particular note is the potential for open data sharing. A new concept called Delta Sharing was introduced, and it was revealed that companies like Oracle, Dell, and Cloudflare R2 have already adopted this technology. ​

What is Delta Sharing?

​ Delta Sharing is a new technology that enables data sharing and synchronization. This allows for efficient data sharing between data owners and users while maintaining data consistency. ​

Companies Adopting Delta Sharing

​ This new technology has already been adopted by major companies like Oracle, Dell, and Cloudflare R2. These companies are using Delta Sharing to maintain data consistency while efficiently sharing data, thereby improving business efficiency. ​

Benefits of Delta Sharing

​ The biggest benefit of Delta Sharing is the ability to share data efficiently while maintaining data consistency. This prevents discrepancies and misunderstandings between data owners and users, thereby enhancing data reliability. ​ Furthermore, Delta Sharing allows data owners to finely control what data and how much of it is shared with users, ensuring data security. ​

Summary

​ The presentation on data and AI governance using Unity Catalog introduced a new concept of data sharing called Delta Sharing. This technology enables efficient data sharing while maintaining data consistency and has already been adopted by many companies. Delta Sharing, which holds new possibilities for data sharing and synchronization, is likely to continue attracting attention. ​ In the next blog post, we will take a detailed look at the upgrade process to Unity Catalog and the migration methods from other systems like Hive. Stay tuned!

Conclusion

This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.

Translated by Johann

www.ap-com.co.jp

Thank you for your continued support!