Python script for text classification and analysis of e-commerce sales data.
This repository contains a Python script that processes textual descriptions of products from an e-commerce dataset and categorizes them into predefined categories using a Naive Bayes classifier. Additionally, the script provides various analysis and visualization methods to explore the dataset, including plotting category distribution, analyzing top customers, and visualizing sales by country and month.
Text Classification: Utilizes a Naive Bayes classifier to categorize product descriptions.
Natural Language Processing (NLP): Preprocesses text data using tokenization and lemmatization and filtering out invalid words.
Analysis and Visualization: Provides insights into the dataset through various analysis and visualization methods.
Error Handling: Handles file loading errors and unexpected errors during execution.
- Ensure Python and required libraries are installed.
- Clone this repository to your local machine.
- Prepare the training dataset in CSV format with 'Description' and 'Category' columns.
- Run the script(text_classifier.py), provide the necessary file paths as arguments to input files.
- Explore the output results such as predicted_categories.json and predicted_categories.csv files.
- Analyze the results and visualizations generated by the script.
- scikit-learn: Library for machine learning in Python.
- NLTK: Toolkit for natural language processing.
- Matplotlib: Visualization library in Python.
MOUNIKA DUMMU
MANEESH SETTIPETA
VIKRAM SAMUDRALA