APC 技術ブログ

株式会社エーピーコミュニケーションズの技術ブログです。

株式会社 エーピーコミュニケーションズの技術ブログです。

Data + AI Summit Keynote, Thursday

Preface

An incredible program awaits us, but first and foremost, we want to express our appreciation to all our partners. Without the partners, this program could not have been realized. We want to thank the GSIs, hyperscalers, and all the ISVs shown in this image. Please make sure to visit the expo hall to check out their programs.

Today, we have a truly amazing lineup scheduled. We will be hearing presentations from the Texas Rangers, the creators of DuckTV, and Bethesda Arena, who conceived the Spark project. Following this, we will listen to presentations from UC and Prime Blue who are the original creators of the Apache Iceberg project. After that, we have presentations from The Positive, Paris’s Art Studio, and lastly, a presentation from Professor Yichin at UW.

We have several presentations planned today. Before going any further, let’s briefly reflect on what happened yesterday…

What we can gain from these series of presentations are deep insights into cutting-edge technology and its specific examples, and their impacts on our lives and work. We hope the discussions and lessons here will serve as guidelines towards the results you strive to achieve.

Small-Language Models (SLN) by Professor Ye Jin

The Magic of SLN: Making the Impossible Possible

One important lesson from this session is the old adage that big progress comes from big leaps. When asked about how a startup builds on a fundamental model, Professor Jin spoke about how her evolution has been fraught with difficulty. However, her passion finds comfort in surmounting these obstacles and produces meaningful outcomes as a result. Interestingly enough, this topic was born in unexpected places, within the startup scene in India.

'Cooking' SLN: An Eco-friendly Approach

Next, she delved into a theme about making the seemingly impossible possible, or to put it precisely, the eco-friendly and efficient way to 'cook' Small-Language Models (SLN). While the SLN might pale in size and quality compared to other models, she's determined to accept the challenge of proving their efficacy.

While many are focusing on large-scale GPT-2 models, Professor Jin has successfully illuminated small-scale models as well.

Rekindling the Popularity of GPT-2

In reality, GPT-2 tends to be overlooked, somewhat neglected. However, this status quo appears to be changing with the startup founded by Professor Jin beginning to discover the utility of GPT-2 in synthesizing solutions.

This session is a treasure-trove for learning how to tackle seemingly difficult challenges. It demonstrates that a large part of technological evolution is perseverance and the pursuit of seemingly unattainable goals, which epitomizes what Professor Jin consistently demonstrates.

On the Evolutions in Data Formats and Integrations

Today, I attended an insightful session by Databricks on their 'lakehouse' architecture. The theme of the session was "Advancements in Data Formats and Integration," with a primary focus on Delta Lake and the Unified Data Analytics Platform (UDAP).

The session inaugurated with the revelation that the key to SLM’s small models is - yes, you guessed it right - data. This fascinating discussion walked through various aspects of a data intelligence platform, specialized for Delta Lake and UDAP.

In a slight deviation, let's talk about Tabler, recently acquired by Databricks. What we all really want to hear is about Apache Icebox, voiced by its founder, Brian Mood. We are looking forward to Brian's upcoming stage appearance.

Understanding the advancements in data formatting and integration is imperative for the advancement of data science and AI. Enter Delta Lake and UDAP, the themes of our keynote. Delta Lake is an open-source storage layer designed to enhance data quality and performance, making large-scale data processing convenient. In contrast, the Unified Data Analytics Platform (UDAP) aims to integrate data into a single format, thereby simplifying its understanding and usage.

As data and AI continue to work in tandem and evolve, advancements in data formatting and integration not only expand our knowledge but propels influential solutions. The recent addition of Tabler is a testament to Databricks' continuous effort in establishing leadership in innovation in this domain.

To keep up with the rapid innovation in this area, look forward to further insights on data and AI trends from the thought leaders at Databricks.

Exploring Unity Catalog and Delta Sharing at Data+AI Summit

At the renowned Data+AI Summit keynote, Databricks made a significant splash by introducing new and improved features called Unity Catalog and Delta Sharing. This underscores the potential of open-source projects and facilitates collaboration between developers and data teams.

Unity Catalog

Unity Catalog is an innovative approach to data governance, and its launch as a part of the Delta Lake community is an exciting and promising development. It promises to streamline the process of data management and sharing, serving as a testament to how a well-coordinated system can markedly improve a data team's productivity and time efficiency.

Delta Sharing

Equally important is Delta Sharing, a groundbreaking tool for simplifying data sharing and team collaboration. Its promise guarantees improved access to various datasets, effectively enhancing the productivity and efficiency of data teams.

Both Unity Catalog and Delta Sharing potentially bring enormous benefits to data management, sharing, and open data collaboration. Still, closely following Databricks’ recent announcements and seeing how these functions work will be crucial to fully understand their range of benefits.

These recent additions serve as significant demonstrations of Databricks' commitment to advanced data science and AI solutions. It will be exciting to see how these new technologies take shape and contribute to the overall trajectory of the data and AI industry.

Applications of Sports and Apache Spark

While many technologies were discussed in the keynote, our real excitement lies in their practical applications. To further illuminate this aspect, we invited baseball star Alexander Lutz to the Data and AI Summit. His appearance at the summit is something we've been eagerly awaiting.

Case Study: Applying Apache Spark in Sports

Lutz recalls, "When the Texas Rangers clinched the World Series in 2023, it was a key landmark for our baseball organization. From being in the league's basement, a few years later, when we secured a historic victory in the World Series for the first time, we felt a sense of pride thanks to the relentless efforts of our players and coaches."

Moreover, this victory had a major resonance in the local community. It's evidenced by the World Series parade attracting over half a million people. For Lutz, who spent his life as a Rangers’ fan, this was a dream come true moment.

This case study provides a concrete example of how technology can directly change people's lives and bolster the ability of organizations to achieve their goals. Leveraging the power of data and AI for the benefit of an organization can yield results beyond what one might expect.

Conclusion

In this session, we got a tangible sense of how real-life applications of sports and Apache Spark can leverage the power of data and AI to contribute to an organization's success. We are looking forward to seeing an increasing number of industries utilizing these technologies.

This article's translation aimed to adapt the original context from Japanese to English for an international, English-speaking audience while maintaining the original context.

About the special site during DAIS

This year, we have prepared a special site to report on the session contents and the situation from the DAIS site! We plan to update the blog every day during DAIS, so please take a look.

www.ap-com.co.jp