APC 技術ブログ

株式会社エーピーコミュニケーションズの技術ブログです。

株式会社 エーピーコミュニケーションズの技術ブログです。

Databricks Connect Powered by Spark Connect: Develop and Debug Spark From Any Developer Tool

Introduction

​This is Johann from the Global Engineering Department of the GLB Division. I wrote an article summarizing the content of the session based on reports from Mr. Kanemaru participating in Data + AI SUMMIT2023 (DAIS).​​

Today, we will be discussing the recent lecture "Databricks Connect Powered by Spark Connect: Develop and Debug Spark From Any Developer Tool." In this lecture, Stefania, a product manager at Databricks, and co-presenter Martin Grund aim to introduce how developers can build, debug, and integrate Spark. The target audience includes data engineers, data scientists, and data analysts. ​ This blog consists of one part, and this is the first part. Let's dive into the content of the lecture! ​

Introducing Databricks Connect and its Usage

​ Databricks Connect is a tool that allows developers to build Databricks Cloud anywhere, enabling practical testing and developing workloads near the cluster. However, there were issues with Databricks Connect and Spark architecture, making it difficult to interact with Spark from languages other than C code. ​

Features of Databricks Connect

​ Databricks Connect has the following features: ​

  1. Developers can build Databricks Cloud anywhere

  2. Practical testing is possible

  3. Workloads can be developed near the cluster

​ This allows developers to build, debug, and integrate Spark in their preferred development environment. ​

Issues with Databricks Connect and Spark Architecture

​ However, there are problems with Databricks Connect and Spark architecture, such as: ​

  1. Difficulty in interacting with Spark from languages other than C code

  2. Complex data conversion between languages

  3. Potential performance degradation

​ To solve these issues, Databricks Connect Powered by Spark Connect was developed. ​

Overview of Databricks Connect Powered by Spark Connect

​ Databricks Connect Powered by Spark Connect has the following features: ​

  1. Build, debug, and integrate Spark from any development tool

  2. Simplify data conversion between languages

  3. Improve performance

​ This allows developers to use Spark more efficiently. ​

Improvements by Spark Connect and the New Version of Databricks Connect

​ The recent lecture featured interesting topics about Databricks Connect Powered by Spark Connect. The lecture aimed to introduce how developers can build, debug, and integrate Spark, detailing the improvements brought by Spark Connect and the new version of Databricks Connect. ​

The Emergence of Spark Connect and its Significance

​ With the introduction of Spark Connect, the Spark architecture was broken down into a single client and server, with a properly designed protocol introduced between them. This resulted in the following benefits: ​

  1. A better experience for developers as the client and server architecture is separated

  2. Developers can use their familiar development tools to build, debug, and integrate Spark

  3. Improved Spark performance, enabling more efficient data processing

Features of the New Version of Databricks Connect

​ Databricks Connect is now built on Spark Connect, with a separated client and server architecture, providing a better experience for developers. The features of the new version of Databricks Connect are as follows: ​

  1. Developers can use Spark from any development tool, building, debugging, and integrating with their familiar tools

  2. The separated client and server architecture allows developers to write code in a local environment and execute it in a remote environment

  3. Debugging becomes easier, as developers can debug while checking the real-time execution status of their code

How to Embed Spark in Applications Including TypeScript

​ The lecture explained how to embed Spark in applications that include TypeScript. In particular, the new version of Databricks Connect was discussed in detail, introducing the separated client and server architecture and the ability to use the client in IDEs and data applications. ​

The New Version of Databricks Connect

​ Databricks Connect is a tool that simplifies data processing using Apache Spark. The new version has the following features: ​

  1. Separated client and server architecture

  2. The client can be used in IDEs and data applications

  3. Supports applications including TypeScript

​ This allows developers to build, debug, and integrate Spark in their preferred development environment. ​

Separation of Client and Server Architecture

​ The new version of Databricks Connect features a separated client and server architecture, providing the following benefits: ​

  1. Easier client-side development

  2. Efficient resource management on the server-side

  3. Optimized communication between client and server

​ Developers can focus on client-side development, while Databricks Connect automatically handles server-side resource management and communication optimization. ​

Using the Client in IDEs and Data Applications

​ The new version of Databricks Connect allows the client to be used in IDEs and data applications, offering the following benefits: ​

  1. Developers can use Spark in their familiar development environment

  2. Easier integration between data applications and Spark

  3. Efficient debugging and testing

​ Developers can build, debug, and integrate Spark in their preferred development environment, and smoothly collaborate with data applications. ​

Embedding Spark in Applications Including TypeScript

​ The new version of Databricks Connect supports applications that include TypeScript, providing the following benefits: ​

  1. Development leveraging TypeScript's type safety is possible

  2. Compatibility with JavaScript is maintained

  3. Supports modern front-end development

​ Developers can easily embed Spark in their TypeScript-based application development. ​

Summary

​ The lecture covered highly interesting content, such as the introduction and usage of Databricks Connect, improvements by Spark Connect, the new version of Databricks Connect, and how to embed Spark in applications including TypeScript. By utilizing this information, developers can use Spark more efficiently and improve data processing and analysis efficiency. We look forward to keeping an eye on the evolution of such technologies in the future.

Conclusion

This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.

Translated by Johann

www.ap-com.co.jp

Thank you for your continued support!