APC 技術ブログ


株式会社 エーピーコミュニケーションズの技術ブログです。

What’s New in Unity Catalog—with Live Demos


Dive into the forefront of data and AI governance advancements with the product team of Unity Catalog. Unity Catalog, designed specifically for businesses that have adopted the Databricks Data Intelligence Platform, is the only solution offering unified governance for both data and AI.

1. Unity Catalog: Tracing its History and Major Features

The speakers of today's session are Paul and Mert, key members of the Databricks product team who were majorly involved in building Unity Catalog. Together, we'll delve deeply into the functionalities that have been developed over the past year, as well as those exciting features lined up for release in the near future.

While the session might slightly vary, we'll start our discussion by understanding the historical backdrop of industry growth and evolution, and how this affected the practice of governance.

There was a certain charm to the early data warehouse models. Its appeal stemmed from its robust SQL support, the possibility of historical analysis, and a formidable support system for structured data.

The Unity Catalog team extensively examined various data architecture experienced throughout the industry, analyzing how much they facilitated or hindered the practice of governance. They recognized the significant role played by the early data warehouses, which offered strong SQL support, enabled historical analysis, and provided advanced support for structured data. With this experience and insights as a foundation, the Unity Catalog team is working towards enhancing the Databricks product portfolio and aiming to deliver superior value to its users.

New Governance Capabilities and Object Types in Unity Catalog

In recent conversations with the Unity Catalog product team, we've delved deep into the latest updates regarding data and AI governance made possible through Unity Catalog. Unity Catalog stands out as the only solution offering a unified approach to governance for both data and AI, and it is integrated within the Databricks Data Intelligence Platform. Today, we focus specifically on the newly added governance capabilities and object types in the Unity Catalog.

Improved Governance Functions and Evolving Object Types

Since last year, the Unity Catalog has been offering numerous distinct object types to facilitate a wide range of architectural designs. These include:

  1. External Locations: These are designed for efficiently managing access rights to cloud storage.
  2. Foreign Catalogs: This is a pilot of 'Lakehouse Federation' that enables the integration of external systems such as Postgres, Snowflake, Redshift, etc., within the scope of Unity Catalog.
  3. Multiple Object Types: This includes tables, functions, models, and volumes, among others, and each of their access rights can be conveniently managed.

Previously, we utilized ANSI's GRANT SQL statements to manage the access rights of each object, such as granting select permissions on a particular table. Commands like "grant select your table to BERT" or "given execute on a model" ensure strict compliance with permissions, regardless of the platform BERT operates.

Building upon this foundation, Unity Catalog has added a wide array of new governance features and object types. With object type capabilities and continuous enhancement of the portfolio, Unity Catalog can efficiently manage governance for data and AI. These features prove particularly useful for managing diverse data types and AI models and are critical for an effective data governance strategic plan.

Whether you are a first-time user of Unity Catalog or an experienced one, it is recommended to leverage these latest governance features and object types for better data and AI management practices.

In the next section, we look forward to delving deeply into other innovative features introduced during this discussion. Stay tuned!

Efficient Data Migration with Unity Catalog and Hive Metastore Federation

Welcome to a detailed exploration of Unity Catalog and its advanced features - Hive Metastore Federation and access control, providing perfect answers for people seeking ways to easily migrate data to Unity Catalog.

Integration with Hive Metastore

The connection between Unity and Hive Metastore is an efficient process. By registering Hive Metastore as a Federated Catalog, or Federated Hive Metastore in Unity, the product can freely navigate throughout your Hive Metastore. Consequently, Unity Catalog will automatically import all assets of the Hive Metastore.

Unleashing the Power of Access Control

Once data is imported, you can apply Unity's Access Control List (ACL). Access control features simplify data management, provide unified access control, adding ease to data management.

Enhanced Data Migration with Hive Metastore Federation

One of the most exciting uses of Hive Metastore Federation is as a data migration tool. The ability to have instant access to data stored in Hive dramatically reduces the effort required for data migration.

Ensuring Interoperability

A second use case of Hive Metastore Federation is promoting interoperability, easing data exchange between different data sources.

The convenience of Hive Metastore Federation and access control transforms Unity Catalog into a powerful tool for strengthening data governance and boosting migration efficiency. This allows the realization of a comprehensive goal of centralized data management.

"Advanced Access Control and Cloud Integration" Progress with Unity Catalog

An in-depth investigation by the Unity Catalog product team has shone the spotlight on the latest advances in data and AI governance. Unity Catalog made its debut as a groundbreaking solution providing unified governance, natively integrated with Databricks Data Intelligence Platform.

In this session, we primarily focused on "Advanced Access Control and Cloud Integration," and specific examples of collaboration with Attribute-Based Access Control (ABAC) were shared.

ABAC and Unity Catalog

One notable feature of Unity Catalog is its ability to detect all Personally Identifiable Information (PII) within the Lake House. In this session, we introduced three specific cases where this functionality was utilized in synergy with ABAC.

In the first example, a typical case of ABAC, masking is applied across all columns but only to those tagged as PII. This masking was traditionally carried out separately for each table, lacking scalability.

However, with Unity Catalog, one rule can be created that can be applied across all tables. As a result, this rule applies masking to columns labelled as PII, potentially protecting several tables at once.

This session unpacked how Unity Catalog has evolved in the world of data and AI governance, and unraveled the latest information on advanced access control and cloud integration. It's extremely important to keep an eye on this development.

About the special site during DAIS

This year, we have prepared a special site to report on the session contents and the situation from the DAIS site! We plan to update the blog every day during DAIS, so please take a look.