APC 技術ブログ


株式会社 エーピーコミュニケーションズの技術ブログです。

Building Your First GenAI App using Databricks, MosiacML and MLRun


Welcome to the world filled with insights of building GenAI applications. Led by Aaron, the co-founder of Iguago which was acquired by McKinsey, and Bruce Philp, a tech fellow at McKinsey & Company, this session explores the concepts, applications, and scaling of GenAI.

Aaron brings a wealth of experience, including stints as VP and CTO at various companies, such as Mellanox which was acquired by NVIDIA. He has also written a book on MLOps for O'Reilly, underscoring his deep understanding of the subject. His experience symbolizes the modern challenge of utilizing AI across the business.

This session aims to harness the power of GenAI, provide practical application methods, and disseminate newly acquired knowledge across the organization. Please expect a detailed overview of AI development and enterprise-wide implementation, with the use of state-of-the-art tools and procedures.

Participating along with Aaron as the second speaker is Bruce Philp from ChronoBlack, his team has been integrated into McKinsey. Bruce and his team play a central role in helping clients scale up AI within their organizations.

What can you expect from this session? A comprehensive understanding of GenAI, and practical exposure to scaling AI projects using GenAI, Databricks, MosiacAI, and MLRun. Join us as we unearth new learnings, explore cutting-edge tools, and create strategies to fully leverage the foundational potential of AI.

Building Your First GenAI Application: Understanding Architecture and the Application Pipeline

In the world of AI and data processing, General AI (GenAI) sits at the cutting edge. Understanding its mechanics and successfully integrating it across the organization is essential to maximally leverage GenAI. In this article, we focus on building a GenAI application and its application pipeline using Databricks, MosaicML, and MLRun, an open-source MLOps orchestration framework.

Defining Gen AI Architecture

Gen AI reference architecture consists of four primary components applicable across the range of Gen AI solutions. We will outline these components sequentially:

Starting from Data

Everything starts from data. It's crucial to understand how you handle data and prepare it to the required quality level. Depending on the type of problem, you may need to fine-tune your model. Even if you're using a pre-trained model, quality is a key element of this journey.

Validation and Input Testing

Next is all about ensuring that the input is receiving the behavior expected from the model, understanding how data is processed, and deciding the types of tests and validations needed to maintain quality.

Deploying the Model

Finally, the model that's created needs to be deployed to the appropriate destination. This includes procedures for tests and validation, along with measures to ensure proper quality.

Overall, all these processes aim for data quality and its optimization. This sums up the overall GenAI architecture and application pipeline. Keep these steps in mind and deepen your understanding of data quality and its importance when building GenAI applications.

In the next article, we'll provide a step-by-step guide to handling data and model deployment.

Article Revision Task

Please review the provided draft article now and ensure the accuracy of "session title" and "session abstract" information. The article structure should follow the "section theme list" as the section progresses. The article to be corrected is in a section of the session, and the target section is specified in the "target section theme". The original text used to create the article is written in the "target section text". The article that needs to be corrected is indicated in the "target section of the article". Please output only the corrected article.

A Deep Dive into Data Engineering and Testing

From my experience, many clients take a series of documents, upload them to a vector store, and later realize that they are not getting the quality, accuracy, or results they expected from the uploaded data. This underscores the importance of data engineering.

For structured data, transforms such as filtering and grouping are relatively straightforward. However, when dealing with text-based information, you need to adeptly utilize transforms suitable for NLP, to effectively clean, remediate, index, and store the information in the vector store.

Well-managed pipelines and role assignments are essential for smooth conveyance. This session covered everything from building GenAI applications to the scaling of enterprise-level GenAI use case scenarios by elaboratively discussing the utilization of Databricks, MosiacML, and the open-source MLOps orchestration framework, MLRun.

The capability of data engineering is a fundamental prerequisite for constructing AI applications. By selecting the right tools and learning how to use them, you can generate efficient and high-quality results. Through this session, I deepened my understanding of how these tools should be utilized and how it stimulates the use of AI in business.

About the special site during DAIS

This year, we have prepared a special site to report on the session contents and the situation from the DAIS site! We plan to update the blog every day during DAIS, so please take a look.