## The Future Brought by the Democratization of Data and AI - Data + AI Summit Keynote Day 1
I'm Johann from the Global Engineering department of GLB. Based on the report from Mr. Ichimura, who is attending the Data + AI SUMMIT2023 (DAIS) on-site, I have written an article summarizing the session content. In this article, we will cover the Data + AI Summit Keynote Day 1. The theme of this keynote was the democratization of data and AI, enabling anyone within an organization to ask questions about data. The vision shared was that in the next 5-10 years, all companies will become data and AI companies, with their own machine learning models, generative AI models, and LLMs to improve their competitiveness.
Latest Concepts, Features, and Services
The summit featured numerous topics on the latest concepts, features, and services. These topics provide valuable information for understanding cutting-edge technologies and services in the field of data and AI. In this article, we will introduce an overview and features of these topics.
[New Announcement] Introduction to LakehouseIQ and its Features
Data Usage Collection and Model Building
LakehouseIQ is a data management platform that collects information on how data is used within an organization and builds models based on that usage. This allows for the visualization of data usage and promotes effective data utilization.
Development of Enhanced Search Version
LakehouseIQ has developed an enhanced search version that recognizes internal terms and provides signals such as popularity, frequent users, update time, and upstream quality issues. This makes it easier to search for data within an organization and improves the efficiency of data utilization.
During the keynote, a demonstration of LakehouseIQ's features was presented. The following image shows an example of the assistant feature, which is expected to be implemented, where a question was input, and the answer and SQL query were automatically generated and executed. When LakehouseIQ is off, the content of the question cannot be properly interpreted, resulting in a NULL
output for the SQL query.
On the other hand, when LakehouseIQ is on, it understands the table and column corresponding to the content of the question, generates an appropriate query, and returns a result that aligns with the intent of the question.
There was also an announcement about the planned provision of APIs for external collaboration. The image shows an example of using the LakehouseIQ API as an agent in LangChain.
Tools to Support Data Exploration and Analysis
LakehouseIQ is a tool that supports data exploration and analysis and helps understand the relationship between corporate datasets and individuals. This allows for maximizing the value of data. The image example shows that it can properly understand and answer questions, including company-specific terms. In summary, LakehouseIQ is a data management platform that provides data usage collection and model building, development of enhanced search version, tools to support data exploration and analysis, and the latest concepts and features. This enables organizations to shape the future by leveraging data and AI.
Lakehouse AI Platform to Support Generative AI
Next is the announcement about the update of Databricks' Lakehouse AI platform. The platform has been updated to support generative AI, and new components have been introduced.
Generative AI-Compatible Lakehouse AI Platform Configuration
Databricks' Lakehouse AI platform consists of the following three core components: 1. Datasets - Preparing data for machine learning 2. Models - Finding and tuning effective models 3. Applications - Deploying and releasing applications By utilizing these components, it helps to find, tune, and customize language models for end-to-end applications. In this keynote, there were announcements about updates for each component.
[New Announcement] Preparing Data for Machine Learning
* Vector Search: Utilize vector search functionality to embed document-style data into vector space for use in language models. Quickly search for relevant documents. * Feature Serving: Supports online feature serving of structured data. Picks up explicitly derived structured data in applications and makes it available in real-time. By combining Vector Search and Feature Serving in a Chat Bot, it can provide accurate answers to customer questions and offer personalized support. This improves the efficiency of customer problem-solving and information provision, enhancing the quality of customer support.
[New Announcement] Finding and Tuning Effective Models
* Curated AI Models: Provides proprietary models optimized for various use cases, such as customer support. Embedded support is also provided through collaboration with Databricks. * AutoML for LLM training: Offers AutoML functionality to automate the training of large language models (LLMs). Helps to streamline model training and achieve optimal performance. * MLflow Evaluation: Utilizes the MLflow Evaluation API to compare custom models with common models and determine the optimal model to provide appropriate answers to questions.
[New Announcement] Deploying and Releasing Applications
* MLflow AI Gateway: Provides MLflow AI Gateway to enable centralized management of AI use cases. Manages credentials, rate limits, caching, etc., and supports model deployment and A-B testing. * Model Serving optimized for LLMs: Offers model serving functionality optimized for LLMs. Deploys the latest release models, including GPU support, to Databricks model endpoints, achieving high-performance and low-latency model services. * Lakehouse Monitoring: Supports quality monitoring of data and AI applications. * Automatic generation of metrics: Automatically generates data quality metrics and visualizes them on a dashboard. Monitors metrics such as latency and toxicity. * Data capture and analysis: Captures input requests and output responses to the model based on inference tables and uses them for debugging and interactive querying. * PII detection support: Supports the detection of sensitive information (PII). Provides appropriate security measures for customer questions containing sensitive information to prevent leakage of sensitive information.
New Features of Unity Catalog
Easy Access with Unity Catalog
Unity Catalog provides easy access to AI models, metadata, and the data used to train them. This allows data scientists and engineers to efficiently handle data and advance AI model development.
[New Announcement] Access to External Systems with Lakehouse Federation
In Unity Catalog, you can access external systems such as MySQL, Postgres, and Snowflake within Data Explorer using the Lakehouse Federation feature. This makes it easier to collaborate data between different data sources, enabling more flexible data analysis.
Monitoring Assets and Users in the Governance Portal
The Unity Catalog governance portal provides a high-level health check of the entire data estate for monitoring and governing all assets and users. This allows for promoting data utilization while maintaining data quality and security. Unity Catalog offers very useful features for leveraging data and AI. With easy access, access to external systems, and a governance portal, it accelerates data utilization and will be a significant advantage for future data and AI companies.
[New Announcement] Introduction to Lakehouse Collaboration Platform
Lakehouse Collaboration Platform is a platform provided by Databricks that enables collaboration in data and AI. This platform offers a comprehensive toolset for securely collaborating in data and AI. Users can share data and AI assets across platforms and clouds. This allows organizations to easily acquire data and quickly gain value. Lakehouse Collaboration Platform consists of the following main components:
Components of Lakehouse Collaboration Platform
* Delta Sharing: With Delta Sharing, data providers can easily share live datasets without creating data replicas. This serves as the foundation for sharing data in services such as Databricks Marketplace and Lakehouse Apps. * Unity Catalog: Enhances data governance, security, and privacy, ensuring data reliability. Unity Catalog is also integrated with knowledge engines such as LakehouseIQ, enabling a better understanding of data meaning and providing more intelligent support. * Databricks Marketplace: An open marketplace that offers data and AI-related products and services. Users can share datasets, AI models, notebooks, etc., and monetize them. Databricks Marketplace also allows for quick evaluation of data products, making it useful for consumers. The Databricks Marketplace has become GA with this announcement. * [New Announcement] Lakehouse Apps: Provides a new way to build, deploy, and manage applications on the Databricks platform. This allows startups and software vendors to offer pre-built applications that solve critical use cases and find potential customers. By using Lakehouse Apps, data always remains within the customer's Databricks instance, and lengthy review processes are not required. App developers can use their preferred language and platform. * [New Announcement] Databricks Clean Rooms: Provides an environment where you can share existing data and run workloads on data in any language while maintaining data privacy. Partner integrations and solutions enhance the Clean Rooms experience and seamlessly collaborate with Databricks. Lakehouse's Clean Rooms are evolving from simple data sharing to secure collaborative computing. These services function as part of Databricks' Lakehouse Collaboration Platform, providing a foundation for organizations to effectively leverage data and AI. Delta Sharing and Unity Catalog make data sharing and management easy, while Databricks Marketplace and Lakehouse Apps provide a platform for offering data and AI products and services. Databricks Clean Rooms provide a secure environment for sharing data and working on data while protecting data privacy.
Conclusion
In this blog, we covered the Databricks Data and AI Summit 2023 Day 1 keynote and introduced topics on the latest concepts, features, and services. The field of data and AI is evolving daily, and various events and information are expected to emerge in the future. We will also introduce the latest information on Day 2, so stay tuned!
Conclusion
This content based on reports from members on site participating in DAIS sessions. During the DAIS period, articles related to the sessions will be posted on the special site below, so please take a look.
Translated by Johann
Thank you for your continued support!