Labcorp Data Platform Journey: From Selection to Go-Live in Six Months

Introduction

This is Johann from the Global Engineering Department of the GLB Division. I wrote an article summarizing the content of the session based on reports from Mr. Ichimura participating in Data + AI SUMMIT2023 (DAIS). The title of the presentation I covered is "LabCorp Data Platform Journey: From Selection to Go-Live in Six Months." The speakers are Mohan Kolli, Director of Enterprise Analytics Platform at LabCorp, Sree, Platform Architect and Technical Manager for Data Lake and Data Warehousing, and Sreekanth Ratakonda, SVP of Sales and Delivery at MSR Cosmos, the implementation partner. The theme of the presentation is to introduce the process of LabCorp's migration from Hadoop to a next-generation platform. The goal is to demonstrate how LabCorp builds a scalable architecture to support customers and meet future demands. The target audience includes technicians interested in Data & AI, business owners interested in building data platforms, and engineers involved in operating data platforms. Now, let's go through the content of the presentation in order.

LabCorp's Challenge: Data Platform for the World's Largest Reference Laboratory

LabCorp is the world's largest reference laboratory, with over 80,000 employees working in more than 100 countries. They are planning a migration from Hadoop to a next-generation platform, aiming to build a serverless architecture that can meet the needs of the next 10 years.

Migration from Hadoop to the Next-Generation Platform

LabCorp is considering the following factors in their migration from Hadoop to the next-generation platform:

Scalable architecture: It is essential to build a scalable architecture to meet future demands.
Serverless: Adopting a serverless architecture can reduce the burden of infrastructure management and operation.
Data integration: The need to integrate data from different sources and manage it centrally.

To meet these requirements, LabCorp chose Databricks. Databricks is a data platform based on Apache Spark, supporting large-scale data processing and machine learning.

Achieving Go-Live in 6 Months

By adopting Databricks, LabCorp was able to achieve the following results:

Fast data processing: Using Databricks significantly improved data processing speed compared to the traditional Hadoop.
Seamless data integration: It became easier to integrate data between Databricks and other data sources, allowing for smoother data integration.
Flexible scaling: Databricks' cloud-based architecture made resource scaling easier.

As a result of these achievements, LabCorp was able to implement Databricks in a production environment in just six months.

Conclusion

LabCorp's data platform journey successfully built a scalable and serverless architecture through the migration from Hadoop to Databricks. This will enable LabCorp to meet future data demands and support their customers. I will continue to pursue the latest information and case studies in the Data & AI field and provide easy-to-understand articles for English readers. Stay tuned for the next article!