APC 技術ブログ

株式会社エーピーコミュニケーションズの技術ブログです。

株式会社 エーピーコミュニケーションズの技術ブログです。

Build Your Data Lakehouse with a Modern Data Stack on Databricks

Introduction

This is Abe from the Lakehouse Department of the GLB Division. I wrote an article summarizing the content of the session based on the report by Mr. Kanamaru, who is participating in Data + AI SUMMIT 2023 (DAIS) on site.

Articles about the session at DAIS are summarized on the special site below.

https://www.ap-com.co.jp/data_ai_summit-2023/

Build a Data Lakehouse with Lakehouse Technology

This time, I will talk about “Build Your Data Lakehouse with a Modern Data Stack on Databricks”. The talk featured Databricks Provost and Head of Evangelism, Ari Kaplan, and Demo Lead, Pearl Luber. They described the modern data stack for building a data lake house, leveraging Databricks and Lake House technology.

Building a data lake house using Databricks' Lake House technology for engineers interested in data & AI, corporate data architects interested in building a data lake house, and business users involved in data analysis and machine learning. I explained how.

Overview of Databricks and Lake House technology

First, a brief introduction to Databricks and Lakehouse technology. Databricks is committed to open source technology and co-develops tools like Apache Spark, MLflow and Delta Lake. Leveraging these technologies provides a modern data stack for building data lake houses.

A data lake house is a new data management system that combines the functionality of a data lake and a data warehouse. Data lakes can store and process large amounts of data, while data warehouses enable fast querying and analysis. A data lake house benefits from scalability, flexibility, and performance by providing both of these capabilities.

Demo: Building a data lake house powered by Databricks

Pearl Luber then gave a demo showing how Databricks can be used to build a data lake house. The demo explained the steps as follows:

  1. Create a Databricks workspace
  2. Data import and preprocessing
  3. Data Integration Leveraging Delta Lake
  4. Analyzing data using Apache Spark
  5. Machine learning model management using MLflow

Through this demo, it was demonstrated that efficient data management and analysis are possible by leveraging Databricks and Lakehouse technology.

Build a data lakehouse with Databricks products and solutions

Databricks offers products and solutions such as Databricks SQL, Serverless SQL, LLMI tables, and RP files for data engineering workflows, live game streaming, machine learning operations, and more. Leverage these products and solutions to build a data lakehouse that fits your specific needs.

Conclusion

This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.

Translated by Johann

www.ap-com.co.jp

Thank you for your support!