What's New in Databricks SQL -- With Live Demos

Introduction

I'm Chen from the Lakehouse Department of the GLB Division. Based on a report by Mr. Nagasato who is participating in Data + AI SUMMIT 2023 (DAIS2023) held in San Francisco, "What's New in Databricks SQL -- With Live Demos" I will introduce the outline of the lecture.

This talk introduces a platform for data ingestion, governance, transformation and exploration using Databricks SQL. The presenter is Mr. Can Efeoglu, Staff Project Manager. The lecture was held for engineers who are interested in data & AI, engineers involved in data analysis and data engineering, and persons in charge of companies considering the introduction of data platforms. Among them, we will focus on points of interest to our viewers, such as how to ingest data, how to set up a streaming table, how to ingest data from Firebase, and the integration of the integrated navigation bar and data explorer.

Configuring data ingestion and streaming tables

Databricks SQL streamlines the process of data ingestion, governance, transformation, and exploration to ingest data in real time. Data ingestion methods introduced were partner integration, streaming tables, and incremental ingestion from object stores and streaming sources. We also demonstrated how to set up a streaming table in Databricks SQL, and how to use Partner Connect to bring in data from Firebase.

Data ingestion method

Data ingestion is the process of ingesting data and Databricks SQL provides the following methods:

Partner integration: Use Partner Connect to ingest data from external data sources such as Firebase
Streaming table: Set up a streaming table in Databricks SQL and ingest data in real time
Object Store: Ingest data from object stores such as Amazon S3 and Azure Blob Storage
Streaming Sources: Do incremental ingestion from streaming sources such as Kafka or Kinesis

How to set up a streaming table

Databricks SQL allows you to ingest data in real time by setting up a streaming table. The demo showed how to set up a streaming table with the following steps:

Go to your Databricks SQL workspace and click on Create Table.
Select the Stream tab and select a streaming source.
Enter the source details and click Create Table to create the streaming table.

This creates a streaming table that is populated with data in real time.

Ingesting data from Firebase with Partner Connect

Partner Connect allows you to ingest data from external data sources such as Firebase. The demo showed how to import data from Firebase using the following steps.

Go to your Databricks SQL workspace and click on the Data tab
Click Add Data and select Partner Connect
Select Firebase, enter the required information and click Connect.

This will pull the data from Firebase and make it available in Databricks SQL.

Data Explorer showed how to use SQL tables and schemas and schedule jobs. Also, a new task type "sql" was introduced to integrate queries, alerts, dashboards and files within jquery-sql.

Data manipulation in Data Explorer

Data Explorer allows you to:

Browsing SQL tables and schemas
Job scheduling
Run the query

This makes the work of data analysis and data processing more efficient.

In the integrated navigation bar, you can centrally manage the following tasks using the new task type "sql".

Query
Alert
Dashboard
File

This makes the work of data analysis and data processing more efficient.

Leveraging Materialized Views and Notebooks

The talk also covered how to use materialized views and other tools within the PDC tool. He also highlighted the benefits of using notebooks on top of SQL Warehouse to simplify user workflows.

Leveraging Materialized Views

Materialized views have the following advantages:

Improved query execution speed
Ensuring data integrity
Easier data visualization

This makes the work of data analysis and data processing more efficient.

Use of notebook

Using Notebooks on top of SQL Warehouse provides the following benefits:

Simplified workflow
Improved code reusability
Easier data visualization

This makes the work of data analysis and data processing more efficient.

Summary

Databricks SQL-powered data ingestion, governance, transformation and exploration platform streamlines data analysis and manipulation tasks with data explorer and unified navigation bar for data manipulation, materialized views and notebooks can do. This makes the work of data analysis and data processing more efficient. I would like to pay attention to the evolution of Databricks SQL in the future.

Conclusion

This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.

Translated by Johann

www.ap-com.co.jp

Thank you for your continued support!