APC 技術ブログ

株式会社エーピーコミュニケーションズの技術ブログです。

株式会社 エーピーコミュニケーションズの技術ブログです。

Project Aqueduct: Simplify Making Delta Lake Tributaries

Introduction

​This is Johann from the Global Engineering Department of the GLB Division.​

​ Today, we will be sharing information about a presentation on Project Aqueduct. This presentation discussed the simplification of building connectors using the Delta kernel and Delta protocol. The theme and purpose of the presentation were to achieve a Lakehouse format that works with multiple engines and languages. The target audience for this presentation includes data engineers, data architects, and data scientists. ​ This blog is structured in one part, and this is the first part. ​ Now, let's dive into the content of the presentation! ​

Simplifying Connector Construction with Delta Kernel

​ In this presentation, the simplification of building connectors using the Delta kernel and Delta protocol was discussed. The goal is to achieve a Lakehouse format that works with multiple engines and languages. ​

Challenges of Delta Protocol with Multiple Implementations

​ Delta protocol is a protocol for efficient data reading, writing, and management. However, there are multiple implementations, which lead to the following challenges: ​

  1. Low compatibility between implementations

  2. New features and fixes are not immediately reflected

  3. Limited support for different engines and languages

​ To address these challenges, Project Aqueduct was proposed. ​

Project Aqueduct Initiatives

​ Project Aqueduct aims to simplify the construction of connectors using the Delta kernel and Delta protocol. Specifically, the following initiatives are being undertaken: ​

  1. Development of Delta kernel: Extract the common parts of the Delta protocol and make them available for multiple engines and languages

  2. Connector construction: Use the Delta kernel to easily build connectors for each engine and language

  3. Community support: Provide documentation and sample code to help developers easily develop and maintain connectors

​ By doing this, a Lakehouse format that works with multiple engines and languages can be achieved, and efficient data reading, writing, and management are expected. ​

Latest Concepts and Features

​ Project Aqueduct continuously incorporates the latest concepts and features. For example, the following features have been proposed: ​

  • Schema evolution: Handle old and new data simultaneously even when the data schema changes

  • Version control: Manage data versions and easily retrieve specific versions of data

  • Transactions: Treat data updates and deletions as transactions to maintain consistency

​ By implementing these features, data handling is expected to become more flexible and efficient. ​ In summary, Project Aqueduct aims to simplify the construction of connectors using the Delta kernel and Delta protocol, and achieve a Lakehouse format that works with multiple engines and languages. This will lead to efficient data reading, writing, and management, and the incorporation of the latest concepts and features. ​

Summary

​ In this presentation, we learned about the simplification of building connectors using the Delta kernel and Delta protocol. Through the initiatives of Project Aqueduct, a Lakehouse format that works with multiple engines and languages is expected to be achieved, and efficient data reading, writing, and management are anticipated. The latest concepts and features are also being incorporated continuously, making it an exciting topic for those working in the fields of data engineering and data science. ​ Let's continue to keep an eye on the progress of Project Aqueduct!

Conclusion

This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.

Translated by Johann

www.ap-com.co.jp

Thank you for your continued support!