This is the English version of the article contributed to JDMC(Japan Data Management Consortium).
My name is Ichimura from A.P. Communications. Starting in 2023, I'm looking forward to become an active as a member of JDMC(Japan Data Management Consortium). Our company, A.P. Communications,is an SIer (System Integrator) with expertise in the IT infrastructure field. In the realm of data management, we are willing to commit to the democratization of data and AI, with the theme of "MODERN DATA + AI STACK COMPANY."
By combining our strengths in the IT infrastructure field with data management, we have seen a significant increase in opportunities to provide comprehensive support. In this article, I would like to discuss the architecture of data management based on my previous work experience, focusing on "Data Mesh."
◇History of Data Mesh 1: The Birth of a Centralized Data Foundation
Firstly, let me explain the origin of the "Data Mesh Architecture". Before the concept of Data Mesh, common data foundations such as data lakes and data warehouses (DWH) were created to address challenges in data management, as outlined below:
Hindrance to AI Strategy due to Data Silos:
Each department collected and stored data separately for their specific tasks and systems, resulting in underutilized data. Even when multiple departments within an organization dealt with similar data, data silos occurred because data was scattered and stored separately. This lack of centralized data visibility hindered the development of integrated data and AI strategies.
Lack of Data Lineage (Data History):
The lineage, which includes information about when data was acquired, who processed it, and whether the data is accurate, was often missing. This absence of lineage made it difficult to determine the accuracy of analysis results.
Data Accumulation Becomes the Goal:
Data collection (hoarding data) became the primary task, and data analysis was often not pursued. When attempts were made to move toward data analysis and preparation, internal organizational barriers and silos impeded progress, leading to repeated delays.
Difficulty in Selecting Service Products:
Selecting service products such as data engineering and data science tools proved challenging due to the wide variety of options available.
The above mentioned challenges led to the development of the Data Mesh Architecture, which aims to address these issues and transform data management for more effective data and AI strategies.
◇History of Data Mesh 2: Challenges with Centralized Data Foundations
While the creation of data lakes and data warehouses (DWH) addressed some of the previously mentioned challenges, it introduced new issues as businesses increasingly embraced big data and saw a growth in business users:
Increased Complexity of Big Data Management:
With the growth of data sources and data users, the complexity of managing large-scale data increased. Managing data structures and system operations became more challenging.
Limitations in Scalability:
Responding to the rapid expansion of data became a challenge, as systems struggled to keep up with the sheer volume of data.
The demand for real-time data and increased streaming processing led to a surge in update processes, requiring higher performance speeds. The systems often couldn't keep up with the flexible data processing demands.
Lack of Data Governance Leadership:
Many companies lacked a designated data governance officer responsible for overseeing data governance across the organization. Alternatively, data governance was left in the hands of data engineers, which sometimes led to misuse or abuse of data.
Stagnation in In-House Data Engineering and Science:
A shortage of in-house data engineers and scientists hindered the ability to develop and implement in-house solutions. Businesses desired to explore data management, AI, and machine learning, but they struggled to find the right partners with a shared perspective for collaborative progress.
These challenges prompted the development of the Data Mesh approach to address the evolving data management landscape and its demands.
◇Birth of the "Data Mesh Architecture"
To address the challenges associated with centralized data foundations, the Data Mesh Architecture was conceived. "Data Mesh" is a concept that can provide hints for solving the aforementioned issues by efficiently and effectively managing large-scale data. It may serve as an idea for project execution and the transformation of organizational management systems. Let's start with the principles of Data Mesh.
Instead of traditional centralized management, data is autonomously managed by ownership within each domain, with responsibility for the data. A domain refers to an organizational group at the level of business units, such as departments or teams. Data owners are designated for each domain, and data exchange between domains is facilitated through domain-to-domain communication. However, for system-wide design, company-wide system policies, and integrated governance, centralized control is necessary to avoid uncontrollable risks.
②Data as a Product:
Building on the definition in principle 1, this principle involves managing data within each domain as a "product." Data in each domain goes through stages in the data pipeline, such as the "Medallion Architecture," and becomes valuable data that includes code and insights, rather than remaining raw data. This data should be recognized as an accessible "product" with value, including horizontal expansion across the entire organization.
③Self-Serve Data Platform:
When implementing the Data Mesh concept, numerous factors need consideration, such as cloud infrastructure, data analytics services, and operational monitoring. Taking many individual elements into consideration, some organizations propose the implementation of a "Data & AI Platform" that encompasses all functions for handling Big Data, including data warehousing, data science, BI, and AI.
④Federated Computational Governance:
This principle is necessary to achieve the objectives of principles 1, 2, and 3. To make each domain autonomously manage data and enable cross-organizational expansion, it's essential to establish company-wide policies, architecture design, and organizational considerations (such as ensuring the required personnel). More detailed explanations regarding organizational structure will be provided in the following section.
These principles of Data Mesh aim to provide a holistic approach to data management that can address the challenges associated with centralized data foundations and the evolving landscape of data management.
◇Organizational Structure (Reform)
Up to this point, the concepts of architecture and system development are explained in the context of Data Mesh principles. However, we believe that beyond architecture and system development, what is even more crucial for the success of data management is maintaining the passion for achieving goals and the organization's structure.
It's essential for a data management support company to continually possess the enthusiasm and execution capacity required for the achievement of the overall objectives of the end-user enterprises. Building a "one-team" relationship with a focus on customer-centric support is crucial.
As organizations grow, a strong sense of siloed units and a shift in focus towards "contributing to company revenue" can occur. This misalignment in objectives can negatively impact a company's data strategy.
In the context of Data Mesh, it's recommended to organize business units at the level of data domains, with a data owner designated for each domain. Each domain maintains its workspace, sources, and data.
In essence, the unit of data and the unit of the organization are aligned, with the idea that each domain autonomously and efficiently manages its data. If you consider it risky to restructure the entire organization from the start, it may be more desirable to gradually and steadily move forward through steps.
During this process, each domain can generate unique insights and, when necessary, provide them to other domains as data products.
Additionally, as the architecture changes, it's advisable to transition from a waterfall development approach to an agile one on a per-use-case basis. This creates a sense of urgency in development, ultimately leading to reduced overall effort, cost savings in data management, and cloud costs.
Further benefits, such as the effective management of notebooks, MLOps, model and data product management, and AI strategy, are likely to become more evident in the next steps. Furthermore, cross-organizational structures that were once considered irregular in existing organizations may become the norm. This leads to increased adaptability and execution capabilities in cross-organizational projects and new ventures, in my personal opinion. If you introduce only the concept of Data Mesh while maintaining the existing organizational structure, there may be a disconnect between the architectural concept and the organizational structure, increasing the likelihood of failure.
Managing Data Mesh and coordinating stakeholders is a process we need to be patient with. and many organizations may revert to their original structure (concept) at the halfway point, during proof of concept (PoC), for example. However, for the success of data management, organizational reform is essential.
◇A.P. Communications' Approach
In the future, it's certain that data management and AI will become synergistic. This is because business strategies will increasingly emphasize visualization, prediction, and decision-making through AI. While human decision-making remains important, improving the maturity of data and AI and utilizing them can significantly enhance the precision of business strategies, leading to a competitive advantage for companies.
Our company believes that data and organizational structure should be the keys to a business AI strategy and we are committed to providing comprehensive support from the perspective of enterprise customers.
Furthermore, combining LLM (Large Language Models) and LLMOps will become a mandatory requirement in the future, so there will be a growing emphasis on the collaboration between business and technology.
I hope you will enjoy this blog. Thank you for your support!