Introduction
This is Johann from the Global Engineering Department of the GLB Division.
Today, we will be sharing information about a presentation on Project Aqueduct. This presentation discussed the simplification of building connectors using the Delta kernel and Delta protocol. The theme and purpose of the presentation were to achieve a Lakehouse format that works with multiple engines and languages. The target audience for this presentation includes data engineers, data architects, and data scientists. This blog is structured in one part, and this is the first part. Now, let's dive into the content of the presentation!
Simplifying Connector Construction with Delta Kernel
In this presentation, the simplification of building connectors using the Delta kernel and Delta protocol was discussed. The goal is to achieve a Lakehouse format that works with multiple engines and languages.
Challenges of Delta Protocol with Multiple Implementations
Delta protocol is a protocol for efficient data reading, writing, and management. However, there are multiple implementations, which lead to the following challenges:
Low compatibility between implementations
New features and fixes are not immediately reflected
Limited support for different engines and languages
To address these challenges, Project Aqueduct was proposed.
Project Aqueduct Initiatives
Project Aqueduct aims to simplify the construction of connectors using the Delta kernel and Delta protocol. Specifically, the following initiatives are being undertaken:
Development of Delta kernel: Extract the common parts of the Delta protocol and make them available for multiple engines and languages
Connector construction: Use the Delta kernel to easily build connectors for each engine and language
Community support: Provide documentation and sample code to help developers easily develop and maintain connectors
By doing this, a Lakehouse format that works with multiple engines and languages can be achieved, and efficient data reading, writing, and management are expected.
Latest Concepts and Features
Project Aqueduct continuously incorporates the latest concepts and features. For example, the following features have been proposed:
Schema evolution: Handle old and new data simultaneously even when the data schema changes
Version control: Manage data versions and easily retrieve specific versions of data
Transactions: Treat data updates and deletions as transactions to maintain consistency
By implementing these features, data handling is expected to become more flexible and efficient. In summary, Project Aqueduct aims to simplify the construction of connectors using the Delta kernel and Delta protocol, and achieve a Lakehouse format that works with multiple engines and languages. This will lead to efficient data reading, writing, and management, and the incorporation of the latest concepts and features.
Summary
In this presentation, we learned about the simplification of building connectors using the Delta kernel and Delta protocol. Through the initiatives of Project Aqueduct, a Lakehouse format that works with multiple engines and languages is expected to be achieved, and efficient data reading, writing, and management are anticipated. The latest concepts and features are also being incorporated continuously, making it an exciting topic for those working in the fields of data engineering and data science. Let's continue to keep an eye on the progress of Project Aqueduct!
Conclusion
This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.
Translated by Johann
Thank you for your continued support!