APC 技術ブログ

株式会社エーピーコミュニケーションズの技術ブログです。

株式会社 エーピーコミュニケーションズの技術ブログです。

Ray on Apache Spark™ Part 2

Introduction

​This is Johann from the Global Engineering Department of the GLB Division. I wrote an article summarizing the content of the session based on the report by Gibo, who is participating in Data + AI SUMMIT2023 (DAIS) on site. ​

Today, I'd like to tell you about a recent talk, "Ray on Apache Spark™". In this talk, Jeremy took the stage and introduced Raid, an open source unified distributed framework. The theme and purpose of the talk is to simplify the scaling of AI and Python applications and make it accessible to everyone without expertise in distributed systems. The intended target audience is engineers who are interested in data & AI, people who do not have expertise in distributed systems but are interested in scaling AI and Python applications, and people who are interested in open source distributed frameworks. ​ This blog consists of two parts, and this article is part two. Part 1 introduced Ray as an open source unified distributed framework, simplifies scaling Python applications, and makes it accessible to anyone without distributed systems expertise. . In Part 2 of this issue, we will explain how to run Raid on Databricks and application examples of Ray on Apache Spark™. ​

How to run Raid on Databricks

​ Raid on Databricks simplifies scaling AI and Python applications and makes them accessible to anyone without distributed systems expertise. In the following, we will explain in detail how to run Raid on Databricks. ​

1. Verify Raid initialization and boot was successful

​ First, I will explain how to run Raid on Databricks. Raid can be initialized and started by following the steps below. ​

  1. Go to your Databricks workspace and create a new notebook.

  2. Select Python as your notebook language and import Raid.

  3. Initialize Raid and start it.

​ By following this procedure, Raid will start normally and distributed processing will be possible. You can also view the Raid status to confirm that the boot was successful. ​

2. A description of the Raid dashboard and the ability to fine-tune resources

Now let's talk about the Raid Dashboard. The Raid Dashboard is a tool that allows you to check the Raid execution status and resource usage status. The following features are provided: ​

  • Check job execution status

  • Check resource usage

  • Show errors and warnings

​ It was also introduced that fine-tuning of resources is possible by using the Raid Dashboard. For example, you can make the following adjustments: ​

  • change job priority

  • Change resource allocation

  • Pause and resume jobs

​ By using these features, you can operate Raid efficiently and simplify the scaling of AI and Python applications.​

Summary

​ Raid on Databricks simplifies scaling AI and Python applications and makes them accessible to anyone without distributed systems expertise. After confirming that the raid initialized and started successfully, you can use the raid dashboard to fine-tune your resources. This will enable efficient distributed processing and accelerate the development of AI and Python applications. ​ In the next article, we will discuss applications of Ray on Apache Spark™. Stay tuned to see how Ray can be used in various fields such as Monte Carlo simulations and hyperparameter tuning of machine learning models!

Conclusion

This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.

Translated by Johann

www.ap-com.co.jp

Thank you for your continued support!