APC 技術ブログ


株式会社 エーピーコミュニケーションズの技術ブログです。

Building Production RAG Over Complex Documents

Building and Enhancing the RAG Pipeline

Retrieval-Augmented Generation (RAG) is an innovative concept that enables the development of diverse applications. This section introduces its fundamental principles and demonstrates the process of users inputting tasks or queries into a knowledge base to retrieve answers. The comprehension of RAG can be expanded based on the type and specificity of questions.

The tasks or queries submitted by users range widely from specific questions like "What is X at a specific point in time?" to more complex issues requiring massive planning and task decomposition, typically involving several sub-problems. A crucial aspect of RAG is providing an interface to efficiently process various types of tasks or questions from users.

These concepts form the foundation for creating a "knowledge assistant". Regardless of how specific or general the question is, by implementing the appropriate methodology, users can gain knowledge more efficiently and confidently.

The following section examines the specifics of building and enhancing a RAG pipeline. Please be patient as we delve into how this revolutionary technology can be used to solve complex problems and enhance user-friendliness.

Advanced Techniques and Tools in Building RAG on Complex Documents

Large Language Models (LLMs) are revolutionizing how users search for, interact with, and even generate new content. Recently, stack and toolkit surrounding Retrieval-Augmented Generation (RAG) have emerged, enabling users to construct applications.

However, for building research support considering more generic contexts beyond simple search and question-answering, RAG alone may be insufficient. So, how can we construct research support considering more generic contexts beyond simple search and question-answering?

This session offered two focus areas as solutions. The first is data quality improvement, elaborating on techniques performed at each step of parsing, capturing, and processing unstructured data. This large volume of unstructured data contains a wealth of information, and enhancing its quality is vital to unlocking its full potential.

It's important in each step of data parsing, capturing, and processing to highlight the importance of improving the quality of unstructured data, especially handling unstructured data carefully. Understanding these impacts, selecting optimal storage systems, and managing data accordingly enables the construction of a more sophisticated RAG system.

Deep Dive into Advanced Active Systems and Continuous Improvement

LLMs are fundamentally altering how users search for, manipulate, and create new information. Particularly, the potential of Retrieval-Augmented Generation (RAG) is appealing.

RAG is one of the newly introduced toolkit stacks that provide valuable tools for users to develop new applications.

Understanding RAG

Essentially, RAG integrates LLMs with complex backend systems such as user interfaces and automatic generation interfaces. By efficiently combining document retrieval and generation, RAG empowers users to extract the information they need with high precision.

Exploring Advanced Active Systems

RAG fosters the development of advanced active systems. YAML-based configuration schemas are used in constructing these systems, enabling users to fine-tune and adapt RAG to specific requirements.

This system organizes and simplifies scattered information throughout the documents, making it digestible. Thus, it enhances the effectiveness of information retrieval and new content generation.

Emphasizing Continuous Improvement

A defining feature of these advanced active systems is that continuous improvement is advocated. With system development being anything, but static, it’s intrinsic to maintain pace with the latest technologies, keep systems up-to-date, and consistently push beyond current functionalities.

Utilizing methods ranging from basic data analysis techniques to advanced ones, the system continuously evolves to meet user demands and generate new content efficiently and effectively. Understanding and leveraging the capacities of RAG is key in this process.


In this chapter, we have delved into applications of Large Language Models, discussed building active systems and the importance of continuous improvement in these systems. We have paid especial attention to the potential of RAG and how to harness it practically.

This knowledge serves as a guideline for exploring and creating new content and interacting with users. As new technologies continue to emerge, our boundaries are continually expanding. We will continue our journey of keeping pace with the latest development and discovering valuable information.

About the special site during DAIS

This year, we have prepared a special site to report on the session contents and the situation from the DAIS site! We plan to update the blog every day during DAIS, so please take a look.