APC 技術ブログ

株式会社エーピーコミュニケーションズの技術ブログです。

株式会社 エーピーコミュニケーションズの技術ブログです。

Embracing the Future of Data Engineering: The Serverless, Real-Time Lakehouse in Action

Introduction

This is May from the GLB Division Lakehouse Department.

"Embracing the Future of Data Engineering: The Serverless, Real- Share Time Lakehouse in Action. This session was conducted by Frank Munz, a technical reporter specializing in Data & AI who works as a technical marketing engineer.

The purpose of this session is to introduce the future of data engineering powered by serverless and streaming data. The target audience is technologists interested in data engineering, engineers interested in serverless and streaming data, and data scientists involved in big data processing.

Utilization of Databricks with smartphone data analysis demo

Frank introduced how to reproduce the problem solved by Google with Databricks and analyze smartphone data. Databricks is a data engineering platform that utilizes big data and AI to efficiently collect, analyze, and visualize data.

Advantages of serverless technology and AWS Lambda explained

Serverless technology is a key enabler of the future of data engineering. The advantages of serverless technology are:

  1. Simplified infrastructure management: No need to manage infrastructure such as servers and networks, allowing developers to focus on application development.
  2. Scalability: The serverless architecture automatically scales according to the number of requests, so it can handle sudden traffic spikes.
  3. Reduced costs: With serverless, you only pay for the resources you actually use, which reduces costs.

AWS Lambda is a representative service of serverless technology and has the following features.

  • Event Driven: Lambdas are triggered by events from AWS services such as S3 buckets and DynamoDB.
  • Language Support: Supports multiple programming languages ​​such as Python, Node.js, Java, Go.
  • Short-running: Lambda is not suitable for long-running processes because it is supposed to run in a short time.

Leverage serverless and streaming data

Combining serverless technology with streaming data enables real-time data analysis and processing. This enables the following use cases:

  • Real-time Dashboard: Real-time visualization of streaming data to keep you informed of your business.
  • Alert notification: You can be automatically notified when data that meets certain conditions appears in your stream.
  • Data pre-processing: Streaming data can be processed and shaped in real-time into formats suitable for subsequent analytical processing.

The following technologies and services can be used to realize such real-time data processing.

  • Apache Kafka: A distributed streaming platform capable of processing large amounts of data in real time.
  • AWS Kinesis: A streaming data platform provided by AWS that enables the collection, processing, and analysis of real-time data.
  • Apache Flink: An open source framework for streaming data processing that enables fast and scalable data processing.

Utilizing serverless technology and streaming data can be expected to improve the efficiency of data engineering and real-time performance. Utilization of these technologies and services will become increasingly important for the future of data engineering.

Summary

In this talk, the future of data engineering utilizing serverless technology and streaming data was introduced. Utilization of these technologies is expected to improve the efficiency and real-time performance of data engineering. The world of data engineering is expected to become more efficient and create new value through the evolution of these technologies. We will continue to pay attention to the evolution of such technology in the future.

Conclusion

This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.

Translated by Johann

www.ap-com.co.jp

Thank you for your continued support!