APC 技術ブログ


株式会社 エーピーコミュニケーションズの技術ブログです。

Databricks on Databricks Path to Unified Governance: Scaling Governance with Company Growth



The speaker of this session is Romet, serving as the head of sales in America. He pointed out one of the initial challenges in governance where different leaders have access to different data sets, leading to discrepancies and a lack of data consistency. Transparency and consistency of the data are pivotal for effective governance.

Addressing how data is managed and distributed across various departments is a crucial governance challenge that often surfaces in the early stages. Establishing a pathway to unified governance begins with overcoming these early discrepancies.

Usually, underscoring the struggle of coping with divergent data viewpoints and the confusion this can engender. When separate departments maintain their own versions of 'truth', making unified, data-driven decisions at an organizational level becomes challenging. Without a coherent strategy for data governance, such conflicts are prone to persist, hindering effective decision-making.

Scaling Governance with Company Growth

As companies grow, so must their governance structures evolve and adapt to increased complexity. During the session "Databricks on Databricks Path to Unified Governance," a particular focus was placed on the challenges and strategies related to scaling governance as organizations expand.

From Scratch to Startup

Initially, most companies function in what might be termed a 'scrappy startup' stage, where governance is minimal and the organization is small enough for mutual trust to suffice among all members. Data management at this phase is simple, mostly confined to capacities that a single laptop can handle. During this period, incorporating a sophisticated governance framework appears unnecessary.

Increasing Growth and Complexity

As the number of staff increases and operations expand, the once straightforward systems begin to show signs of stress under the heightened demands. The volume of data soars, and its management becomes significantly complex. This marks a crucial phase where scaling governance frameworks becomes imperative. While there may be nostalgia for the simplicity of the startup phase, the increasing complexities underscore successful growth.

The surge in scale introduces several challenges, such as facilitating effective collaboration among a growing number of team members and managing access to an expanding pool of datasets. Establishing robust governance frameworks and meticulous planning for scalability are vital activities during this stage of growth.

Insights from Databricks experts, shared during the session, illustrated the progression of governance as companies evolve from small startups to more substantial entities. Attendees discovered phase-specific challenges and obtained practical solutions for adjusting governance systems as their companies grow.

Understanding the significance of adapting systems and procedures in line with organizational growth is crucial. The strategies discussed are designed to assist attendees in nurturing sustainable growth within their enterprises.

Implementing Unity Catalog for Effective Governance

In the special session titled "Databricks on Databricks Path to Unified Governance", significant insights were shared on the implementation of Unity Catalog to achieve unified governance across Databricks’ expansive Lakehouse. Databricks’ Lakehouse itself is a monumental data ecosystem, containing 125,000 tables, engaging almost 9,000 users, and managing 17,000 jobs that are utilized across diverse departments such as product development, marketing, finance, and even non-traditional units like legal and human resources.

One striking example discussed was the use of data and AI by the facilities management team to forecast and analyze office capacity in preparation for returning to the office. The session highlighted how essential data analytics have become to each facet of Databricks’ operations, demonstrating the universal applicability of data-driven decision making.

Moreover, Unity Catalog plays a pivotal role in organizing and controlling access to these massive amounts of data, ensuring that every department can leverage relevant datasets without compromising on governance standards. The overview provided about the interaction between data resources and users through Unity Catalog illuminated the path toward effective and sophisticated data management strategies.

This section not only delineated the sprawl of Databricks’ Lakehouse but also underscored the comprehensive capabilities of Unity Catalog in terms of data governance. Attendees were encouraged to participate in related sessions, specifically one on Thursday, to gain further understanding of how data and AI are harnessed across various business functions at Databricks.

These insights are instrumental for organizations aiming to implement robust data governance frameworks within their own operations, making the information shared in this section invaluable for advancing corporate data strategies.

Balancing Platform Engineering and Data Practitioner Needs

Within the framework of utilizing Databricks, a pivotal discussion emerges around the equilibrium between the requirements of platform engineers and data practitioners. The role of platform engineers entails constructing and sustaining the infrastructure without focusing extensively on specific data intricacies. On the contrary, data practitioners prioritize their attention predominantly on the data, often showing minimal concern for the underlying storage infrastructure or its complexities. Their main objective is ensuring that data is not only well-managed but also readily accessible.

Here, the Unity Catalog from Databricks plays a crucial role. With the Unity Catalog, the often burdensome concern for the foundational infrastructure—such as the "plumbing" or technical nitty-gritties—is significantly reduced. This allows for a sharper focus on access management and data governance essentials including security, privacy, and compliance with regulatory standards.

By providing functionalities that cater to the specific needs of both platform engineers and data practitioners, Databricks facilitates an environment where each role can efficiently focus on their respective responsibilities. Platform engineers can divert their attention away from the daily intricacies of infrastructure upkeep, while data practitioners can dedicate themselves to deeper data analysis and utilization.

The cooperative approach fostered by Databricks enhances unified governance, enabling organizations to extract maximum value from their data assets, all while ensuring robust compliance and security. The seamless integration between technical and practical data-handling roles within the platform creates a more competent and governed data management structure in cloud ecosystems.

About the special site during DAIS

This year, we have prepared a special site to report on the session contents and the situation from the DAIS site! We plan to update the blog every day during DAIS, so please take a look.