APC 技術ブログ


株式会社 エーピーコミュニケーションズの技術ブログです。

LLMs in Production: Fine-Tuning, Scaling, and Evaluation


In the highly competitive market of today, businesses increasingly rely on Large Language Models (LLMs) to enhance various aspects of their operations. This session focuses on fine-tuning LLMs to meet specific business needs and applications, providing valuable insights for those involved in the commercial deployment of these AI models.

The capability to fine-tune LLMs can dramatically transform how businesses interact with information and automate processes. This section presents a clear path to addressing challenges and making strategic adjustments when fine-tuning LLMs for optimal performance in unique business environments.

1. Problem Consideration

Effective tuning begins with a thorough understanding of the business challenge, identifying points where LLMs can be most advantageous. This step involves defining the model’s tasks and outlining the expected outcomes.

2. Dataset Generation

The key to fine-tuning lies in the careful preparation and structuring of datasets. The quality and scale of data not only facilitate training but also significantly impact model performance.

3. Model Fine-tuning

Specific adjustments aligned with business objectives include changing hyperparameters and refining training approaches, aiming to exceed performance expectations.

4. Evaluation

It is crucial to ascertain whether the fine-tuned model meets performance standards. This evaluation phase involves verifying effectiveness and identifying opportunities for further improvement.

Navigating these steps enables companies to fully leverage the capabilities of LLMs, fostering optimization and innovation across functions. Detailed discussions and case studies within this session illuminate pathways to mastering LLM fine-tuning for enhanced business outcomes.

Evaluation and Deployment of LLMs

The session on “LLMs in Production: Fine-tuning, Scaling, and Evaluation” featured insightful discussions on the deployment of Large Language Models (LLMs) in business settings. This segment of the article focuses on the evaluation and deployment of LLMs.

Key Considerations for Evaluating LLMs

When introducing LLMs in business settings, the first step is to assess the quality of their outputs. The ability to solve specific problems and respond with the necessary accuracy serves as a primary criterion for evaluating these models' performance.

Another critical aspect is latency, which refers to the response time experienced by users. Even if a model delivers high accuracy, excessive latency can adversely affect the user experience. Addressing this factor is crucial during the system design phase to ensure the practical usability of the model.

Efficient Deployment Method: Retrieval-Augmented Generation (RAG)

To effectively manage performance and latency, Retrieval-Augmented Generation (RAG) was recommended as a strategic approach. This method stores information related to various issues in a database and injects relevant data into prompts based on queries. Implementing RAG allows for the generation of informed responses with fewer prompts, significantly reducing the need for extensive fine-tuning.

The RAG approach can significantly minimize latency while maintaining the model's efficacy, ensuring users receive prompt and accurate responses. Implementing such systems can dramatically enhance customer satisfaction without compromising service quality.

Internal Evaluation and Data Enrichment

The session on deploying Large Language Models (LLMs) in business contexts was particularly insightful regarding "Internal Evaluation and Data Enrichment." We delved into strategies for evaluating these models internally and enriching training datasets. Below are the discussed key practices:

  1. Initial Deployment to Internal Customers: It is crucial to first deploy LLM technology within a controlled internal environment. This strategy involves delivering the technology to a selected group of internal users and assessing its practical applications and effects before public release. This phase helps identify and rectify issues, ensuring the LLM is robust enough for external deployment.

  2. Data Collection from Customer Queries: An essential step in internal evaluation is the collection and analysis of data. Storing the prompts and outputs generated by the LLM in response to internal customer queries in a database allows for continuous measurement of the model’s performance. This data not only serves as a benchmark for improvement but also provides valuable insights for further training and refining the model.

  3. Creation and Testing of Variants: To enhance the adaptability and effectiveness of LLMs, creating and testing different model variants is recommended. These variants should be designed to address different types of queries and issues. Evaluating the performance of each variant facilitates the development of more versatile and responsive models.

By leveraging these strategies of internal product testing and data enrichment, companies can mitigate risks and significantly improve the practicality and accuracy of LLMs. This streamlined approach ensures that by the time the product is released to the public, it is optimized and more likely to deliver a better user experience. Thus, continuous internal evaluation and dataset enrichment are essential for minimizing risks associated with deploying LLMs in real-world applications and fully exploiting their potential capabilities.

Robustness and Privacy Enhancement in Operational LLMs

The deployment of Large Language Models (LLMs) in business environments necessitates a strict focus on customer data security and privacy. Privacy and security are particularly crucial in business applications where handling sensitive data is often involved. During the session, a wide range of techniques to keep LLM operations secure and private were extensively discussed.

Commitment to Customer Data Protection

Businesses bear a fundamental responsibility for the security of customer data used within LLM applications. The session thoroughly explained in-house protocols for safe data handling from creation to management, ensuring compliance with privacy standards.

Layered Data Protection Strategy

Effective data management when using LLMs involves implementing multi-layered security measures. Tier 1 data (e.g., information about Atlassian developers) can be shared with other developers under managed conditions. In contrast, Tier 2 data (including customer code-related information) requires a strict approval process from a dedicated privacy team.

Integration of Developers and Privacy Teams

A collaborative approach between developers and privacy teams is vital for ensuring robust data privacy and security. The session provided detailed descriptions of these cooperative frameworks and showcased successful examples of secure data operations in LLM deployment.


While maintaining privacy and security in LLM operations involves complexities, these challenges can be addressed through systematic internal processes and effective teamwork between developers and privacy teams. Dedicated efforts to foster these collaborative environments substantially enhance an organization's ability to deploy LLMs securely.

Throughout the session, several methods for securely implementing LLMs were discussed, providing valuable insights to organizations aiming to integrate these systems while maintaining privacy and robust data security. This knowledge is essential for organizations looking to embrace the future developments of LLM technology, ensuring a scalable foundation is established not just to enhance existing operational frameworks but also to embrace the future developments of LLM technology. Through careful planning and execution, LLMs become a crucial tool in achieving business goals, underscoring their significant role in technological advancement.

About the special site during DAIS

This year, we have prepared a special site to report on the session contents and the situation from the DAIS site! We plan to update the blog every day during DAIS, so please take a look.