APC 技術ブログ

株式会社エーピーコミュニケーションズの技術ブログです。

株式会社 エーピーコミュニケーションズの技術ブログです。

What's New in Databricks SQL -- With Live Demos

Introduction

I'm Chen from the Lakehouse Department of the GLB Division. Based on a report by Mr. Nagasato who is participating in Data + AI SUMMIT 2023 (DAIS2023) held in San Francisco, "What's New in Databricks SQL -- With Live Demos" I will introduce the outline of the lecture.

This talk introduces a platform for data ingestion, governance, transformation and exploration using Databricks SQL. The presenter is Mr. Can Efeoglu, Staff Project Manager. The lecture was held for engineers who are interested in data & AI, engineers involved in data analysis and data engineering, and persons in charge of companies considering the introduction of data platforms. Among them, we will focus on points of interest to our viewers, such as how to ingest data, how to set up a streaming table, how to ingest data from Firebase, and the integration of the integrated navigation bar and data explorer.

Configuring data ingestion and streaming tables

Databricks SQL streamlines the process of data ingestion, governance, transformation, and exploration to ingest data in real time. Data ingestion methods introduced were partner integration, streaming tables, and incremental ingestion from object stores and streaming sources. We also demonstrated how to set up a streaming table in Databricks SQL, and how to use Partner Connect to bring in data from Firebase.

Data ingestion method

Data ingestion is the process of ingesting data and Databricks SQL provides the following methods:

  1. Partner integration: Use Partner Connect to ingest data from external data sources such as Firebase
  2. Streaming table: Set up a streaming table in Databricks SQL and ingest data in real time
  3. Object Store: Ingest data from object stores such as Amazon S3 and Azure Blob Storage
  4. Streaming Sources: Do incremental ingestion from streaming sources such as Kafka or Kinesis

How to set up a streaming table

Databricks SQL allows you to ingest data in real time by setting up a streaming table. The demo showed how to set up a streaming table with the following steps:

  1. Go to your Databricks SQL workspace and click on Create Table.
  2. Select the Stream tab and select a streaming source.
  3. Enter the source details and click Create Table to create the streaming table.

This creates a streaming table that is populated with data in real time.

Ingesting data from Firebase with Partner Connect

Partner Connect allows you to ingest data from external data sources such as Firebase. The demo showed how to import data from Firebase using the following steps.

  1. Go to your Databricks SQL workspace and click on the Data tab
  2. Click Add Data and select Partner Connect
  3. Select Firebase, enter the required information and click Connect.

This will pull the data from Firebase and make it available in Databricks SQL.

Data manipulation with data explorer and unified navigation bar

Data Explorer showed how to use SQL tables and schemas and schedule jobs. Also, a new task type "sql" was introduced to integrate queries, alerts, dashboards and files within jquery-sql.

Data manipulation in Data Explorer

Data Explorer allows you to:

  1. Browsing SQL tables and schemas
  2. Job scheduling
  3. Run the query

This makes the work of data analysis and data processing more efficient.

Task management with integrated navigation bar

In the integrated navigation bar, you can centrally manage the following tasks using the new task type "sql".

  1. Query
  2. Alert
  3. Dashboard
  4. File

This makes the work of data analysis and data processing more efficient.

Leveraging Materialized Views and Notebooks

The talk also covered how to use materialized views and other tools within the PDC tool. He also highlighted the benefits of using notebooks on top of SQL Warehouse to simplify user workflows.

Leveraging Materialized Views

Materialized views have the following advantages:

  1. Improved query execution speed
  2. Ensuring data integrity
  3. Easier data visualization

This makes the work of data analysis and data processing more efficient.

Use of notebook

Using Notebooks on top of SQL Warehouse provides the following benefits:

  1. Simplified workflow
  2. Improved code reusability
  3. Easier data visualization

This makes the work of data analysis and data processing more efficient.

Summary

Databricks SQL-powered data ingestion, governance, transformation and exploration platform streamlines data analysis and manipulation tasks with data explorer and unified navigation bar for data manipulation, materialized views and notebooks can do. This makes the work of data analysis and data processing more efficient. I would like to pay attention to the evolution of Databricks SQL in the future.

Conclusion

This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.

Translated by Johann

www.ap-com.co.jp

Thank you for your continued support!