This project is a Q&A tool designed to extract actionable insights from a large dataset of Google Store reviews for a music streaming application, such as Spotify. The tool leverages natural language processing and vectorized databases to provide insightful responses to management queries.
- Introduction
- Objectives
- Dataset Overview
- Features
- Setup and Installation
- Objectives 1
- Objectives 2
- Objectives 3
- Usage
- Screenshots
- Acknowledgments
This project addresses the challenge of extracting insights from 3.4 million unstructured Google Store reviews. The management of a music streaming application requires insights into what users like, dislike, compare, and suggest about the application. This tool aims to provide an efficient way to extract this information using AI and vectorized storage.
The primary objectives of this project are:
- Data Preprocessing and Vectorized Database Creation: Preprocess the Google Store reviews dataset and create a vectorized database for efficient information retrieval.
- RAG Chain Creation: Develop a Retrieval-Augmented Generation (RAG) chain to retrieve relevant information based on management's queries.
- Build a Chatbot UI: Design and implement a user-friendly chatbot interface using Streamlit to allow easy interaction with the Q&A tool.
The dataset contains Google Store reviews of a music streaming application. It includes:
- Review ID: Unique identifier for each review.
- Pseudo Author ID: Anonymized identifier for the author.
- Author Name: Name of the reviewer (anonymized).
- Review Text: Content of the review.
- Review Rating: Numeric rating provided by the user.
- Review Likes: Number of likes the review received.
- App Version: Version of the application reviewed.
- Review Timestamp: Date and time of the review.
The dataset can be accessed through one of these below:
- Dataset: Download here
- Dataset source: Kaggle - Spotify Google Store Reviews
- Question Answering: Answers questions based on user reviews of the music streaming app.
- Insights on Competitors: Provides comparisons with other music streaming platforms.
- User-Friendly Interface: Streamlit-based UI with chat history and sample queries.
- Interactive Typing Animation: Simulates typing for a conversational experience.
This project utilizes the following frameworks and libraries:
- Python 3.12.7
- [CUDA](https://developer.nvidia.com/cuda-toolkit) (optional, for GPU support)
- OpenAI 1.7.2
- Embedding model:
text-embedding-ada-002
- Chat model:
gpt-4o-mini
- Embedding model:
- Chroma 0.4.22 (Vector Database)
- LangChain 0.1.0
- Streamlit 1.40.0
- Clone the repository:
git clone https://github.com/yourusername/qna-chatbot.git cd qna-chatbot
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables by creating a
.env
file in the root directory and add your OpenAPI API key:OPENAI_API_KEY=your_openai_api_key
- (Optional) Verify CUDA support:
import torch print("CUDA available:", torch.cuda.is_available())