Introduction
This is May from the Lakehouse Department of the GLB Division.
KX Systems leveraged the time series database kdb+ to release KDB.AI in September 2023. KDB.AI is a powerful knowledge-based vector database and search engine that uses real-time data to provide advanced search, recommendations, and personalization for AI applications.
In this article, we will introduce sentiment analysis in the KDB.AI sample code introduced in the previous article.
Contents
Introduction to Sentiment Analysis
Sentiment analysis is the process of analyzing text to determine whether the emotional tone of a message is positive, negative, or neutral. The purpose of this sample is to extract valuable sentiment from Disneyland Resort reviews to gain a deeper understanding of the customer experience. Leverage natural language processing (NLP) and sentiment analysis to evaluate the sentiment expressed in reviews. KDB.AI allows you to save not only the reviews themselves but also sentiment labels as metadata.
The process flow is as follows.
Load review data
Perform sentiment analysis on reviews
Vectorize and embed reviews
Save embeds in KDB.AI
Search for reviews similar to your target review
Delete KDB.AI table
Sample data is used from Kaggle. The dataset includes 42,000 reviews posted by travelers on TripAdvisor about three Disneyland branches: Paris, California, and Hong Kong. www.kaggle.com
Sample Code Practice
Let's try uploading the sample code and data on Databricks Workspace. kdb.ai
1. Load review data
After extracting the data from the zip file, read the csv file and view the contents.
CPU and memory speed affect processing time when performing sentiment analysis. If you run this example on a CPU, limit the number of rows to improve performance. If you run on GPU, you can use the full dataset.
2. Perform sentiment analysis on reviews
Import important components for sentiment analysis.
AutoModelForSequenceClassification: A transformer model fine-tuned for sequence classification tasks such as sentiment analysis. It can be used to handle various NLP tasks.
AutoTokenizer: Preprocess (encode) the text data you feed into the model.
MODEL: A fine-tuned model for emotion classification tasks. In this example, we will import the RoBERTa model from Hugging Face.
Configure a pipeline to easily perform sentiment analysis on text data.
Perform sentiment analysis on the entire data.
Visualizing the results shows the diversity of sentiment across the three branches and allows you to compare ratings. All positive reviews are displayed with a rating value of 4, 5. Negative reviews have lower ratings, with all California branches receiving the lowest ratings of all branches.
3. Vectorize and embed reviews
Use Sentence Transformer to perform text embedding before saving to KDB.AI.
4. Save embeds in KDB.AI
Connect KDB.AI session with API key and Endpoint. Please refer to the article below for how to create an API key. techblog.ap-com.co.jp
KAfter connecting to DB.AI, create a schema. To save time series data as a vector, set the "Branch, Label, Score, Rating, Review_Text, embeddings" columns. To create an index, as a parameter
- Previously set window size
- distance metric
- Index type is required.
Distance metrics are used to measure the similarity between vectors. KDB.AI supports Euclidean distance L2, Dot product IP, and Cosine similarity CS. In this sample code, CS is selected.
In KDB.AI, Flat, IVF, IVFPQ, and HNSW can be used as indexes. In this sample code, HNSW is selected.
Create a table with the configured schema and save the embedded data in the table.
5. Search for reviews similar to your review
Create a function to perform a search and retrieve results. Enable further analysis of the obtained data.
Let's narrow down the query to 10 results using the search term "Are customers satisfied with the food at the park?"
We will also aggregate the number of branch sentiments, narrow down the results to 25, and run a query.
Create a function for visualization.
Use the search term "Are customers satisfied with the food at the park?" and narrow the results down to 50 results to view the results in a graph.
6. Delete KDB.AI table
It is a best practice to delete KDB.AI tables when you are finished using the table.
Summary
In this post, I ran sentiment analysis, which is a sample code of KDB.AI, on Databricks. Chatbot history, customer service call records, customer complaint emails, returns and refund comments, surveys, etc. can all be analyzed quickly with KDB.AI and can be used to improve customer satisfaction.
Thank you for reading until the end. Thank you for your continued support!
We provide a wide range of support, from the introduction of data analysis platforms using Databricks to support for in-house production. If you are interested, please feel free to contact us.
We are also looking for people to work with us! We look forward to hearing from anyone interested in APC.
Translated by Johann