This is Johann from the Global Engineering Department of the GLB Division. I wrote an article summarizing the content of the session based on reports from Mr. Ichimura participating in Data + AI SUMMIT2023 (DAIS).
In this post, we will be discussing the presentation titled "Increasing Trust in Your Data: Enabling a Data Governance Program on Databricks Using Unity Catalog and ML-Driven MDM." The theme of the presentation is "Improving Data Trustworthiness through the Implementation of a Data Governance Program," with the aim of emphasizing the importance of data governance. The target audience for this presentation includes data engineers and data analysts interested in data governance, business owners and managers who want to improve data trustworthiness, and corporate representatives considering the introduction of data governance. This blog is divided into two parts, and this time we will deliver Part 1. Now, let's dive into the content of the presentation.
The Importance of Data Governance and Vision
Data governance is a crucial element for companies to manage their data properly and maximize its value. In this article, we will explain the importance of data governance and the vision for its realization, based on a presentation that introduces Comcast's journey of digital transformation leveraging data and analytics.
Vision to Target Potential Customers for Advertisers
Comcast has a vision of utilizing data and targeting to reach potential customers for advertisers. This allows advertisers to develop effective advertising strategies, and consumers will see ads that are of interest to them. To realize this vision, the following elements are essential:
Data state: The data must be accurate and up-to-date.
Origin system: It is important to understand where the data came from and how it was generated through various processes.
Ownership: The owner of the data must be clear.
Responsibility: There must be clear responsibility for data quality and security.
General use cases for the entire data: It is important to understand how the data should be utilized and for what purpose it is used.
Realizing a Data Governance Program
To implement a data governance program, Comcast is leveraging Databricks. Databricks is an integrated data platform widely used in various fields such as data engineering, data science, and machine learning. Comcast is utilizing Unity Catalog and ML-Driven MDM (Master Data Management) on Databricks to realize their data governance program.
Unity Catalog is a catalog service on Databricks for implementing data governance. By using Unity Catalog, you can centrally manage information such as data origin, ownership, and responsibility, thereby improving data trustworthiness.
ML-Driven MDM is an approach to master data management using machine learning. By utilizing ML-Driven MDM, you can improve data quality and consistency. Moreover, by leveraging machine learning, you can automate processes such as data cleansing and matching, enabling efficient data governance.
Data governance is a crucial element for companies to manage their data properly and maximize its value. Through their journey of digital transformation leveraging data and analytics, Comcast emphasizes the importance of data governance. By utilizing Databricks and implementing Unity Catalog and ML-Driven MDM, they are realizing a data governance program that improves data trustworthiness. In Part 2, we will discuss specific examples of Comcast's data governance using Databricks Lakehouse architecture and explain data matching techniques. Stay tuned!
This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.
Translated by Johann
Thank you for your continued support!