APC 技術ブログ


株式会社 エーピーコミュニケーションズの技術ブログです。

Implementing the Lakehouse, from BI to AI

Evolution Toward an Integrated Architecture

Initially, ABN Amro's data team deployed a centralized platform to facilitate the development of domain-specific data applications. Unfortunately, the outcomes did not match the initial expectations, leading to a shift in architecture. Despite adopting a distributed approach, the organization continued to maintain a central data lake, representing an interesting deviation from the standard data mesh ideology. Here's how the architecture evolved:

  1. Initial Setup: Initially, the core infrastructure at ABN Amro revolved around a central data distribution platform. This framework was crucial for processing and validating the integrity of all corporate 'golden sources' or authoritative data sets, ensuring that such data was accessible only by authorized personnel.

  2. Development of Domain Data Applications: Initially, each domain leveraged the central platform to create specific data applications, aiming to facilitate data usage across different segments of the organization.

  3. Integration of New Golden Sources: Whenever a new 'golden source' was developed, it was routed back to the central platform, validated, and then disseminated across the broader enterprise.

This reflection on ABN Amro's shift towards a blend of centralization and decentralization offers an insightful perspective on the practicalities of data architecture in large corporations, revealing a nuanced approach necessary for effective data governance and utilization.

Standardization and Governance: ABN Amro's Data Architecture Strategy

In the session "Implementing a Lakehouse from BI to AI," ABN Amro's data team extensively discussed their integrated domain-driven data and AI platform. Particularly interesting was the focus on 'Standardization and Governance,' highlighting the interaction between corporate structure and control of data architecture.

During the presentation, the concept of Conway's Law was explored, suggesting that system design reflects the communication structure of the organization that creates it. Thus, without a consistent team structure, achieving successful architectural outcomes is challenging.

ABN Amro strategically navigates between centralization and decentralization to bolster their data architecture. The 'O4D Model' is crucial to their strategy, categorizing responsibilities into three clear types of work, clarifying roles in data management and platform development, and ensuring accountability.

This model enables ABN Amro to implement robust standardization and governance protocols across its data operations, maintaining high standards of data security and quality. Additionally, this structure supports a seamless transition and integration from traditional business intelligence frameworks to advanced artificial intelligence applications.

The experience and practices shared by ABN Amro provide valuable insights for other organizations looking to refine their data strategies. Highlighting the importance of organizational structure and effective governance can greatly aid in integrating innovative technologies and realizing data-driven outcomes.

Centralized Management and Data Sharing

In the integrated domain-driven data and AI platform discussed by ABN Amro's data team, each unit maintains its research group, providing services with meticulous attention to specific objectives. This model operates entirely under the bank's control, setting pioneering standards in regulatory compliance.

The structure includes four main components: 'Data Ingestion Unit,' 'Application Unit,' 'Orchestration Unit,' and 'Storage Unit.' Focusing on the 'Storage Unit,' it is supported by the Unity Catalog and represented in blue to emphasize data focus, complemented by yellow to highlight security. This unit is seamlessly integrated into the corporate database, ensuring robust protection and security of stored data.

Centralized management and data sharing not only protect information but also enable efficient access across the organization. This robust foundation facilitates innovative and data-driven decision-making. By assigning specialized functions to each unit, the system ensures flexibility and scalability, significantly enhancing the overall corporate data strategy.

Compliance and Orchestration

In the complex terrain of financial risk, swift compliance with stringent regulations is often required, sometimes encouraging a 'quick and rough' methodology that seemingly contradicts specific policies. Nonetheless, each area retains its responsibilities. To manage this, responsibilities are distributed between data engineering and data governance segments. Both sectors are accountable for addressing discrepancies at the governance level and providing reasons for these discrepancies to the Chief Data Officer (CDO).

This distribution of responsibilities raises a critical question: "How can we rapidly meet regulatory requirements while maintaining strict governance?" This issue was the focus of discussion during the examination of operational differences between development stages and production stages. Moreover, the choice of Databricks for orchestration was extensively debated.

Choosing Databricks was highlighted due to its excellent scalability and comprehensive integration capabilities with various data sources, contrasting with more traditional options. This choice underscores a strategic decision to enhance operational efficiency and adaptability.

This segment details practical strategies implemented to delicately balance the urgency of regulatory compliance with the need for skilled orchestration. By delegating specific responsibilities to each area, the approach of having each maintain clear governance tasks effectively simplifies the management complexity of extensive data environments.

In summary, this session's segment clarified how a planned lakehouse architecture using tools like Databricks serves as a robust framework supporting the transition from business intelligence to artificial intelligence. This progressive data management model exemplifies how financial institutions can navigate the complexities of compliance and orchestration with agility and precision.

About the special site during DAIS

This year, we have prepared a special site to report on the session contents and the situation from the DAIS site! We plan to update the blog every day during DAIS, so please take a look.