Classification of Grape Varieties in Wine

Introduction

This project aims to develop a model to classify grape varieties in wine (A, B, or C) using machine learning techniques. The model is trained using a dataset from Kaggle.

Data Description

The dataset contains 534 examples of wine, each with 13 features:

Alcohol
Malic acid
Ash
Alcalinity of ash
Magnesium
Total phenols
Flavanoids
Nonflavanoid phenols
Proanthocyanins
Color intensity
Hue
OD280/OD315 of diluted wines
Proline

The target variable is the wine class (A, B, or C).

Descriptive Analysis

Descriptive analysis of the data was performed to understand its distribution and correlations. Distribution plots, histograms, and correlation matrices were generated.

Data Preprocessing

Variables were normalized using standard scaling. Principal Component Analysis (PCA) was applied to reduce dimensionality and eliminate multicollinearity.

Model Training

Three machine learning models were trained:

XGBoost
SVM
Random Forest

Model Evaluation

The models were evaluated using the following metrics:

F1-score
AUC-ROC multiclass
Precision
Recall

Hyperparameter Optimization

Model hyperparameters were optimized using the Optuna library.

Predictions

Predictions of grape variety were generated for new wine examples.

new_data = [
    [13.72, 1.43, 2.5, 16.7, 108, 3.4, 3.67, 0.19, 2.04, 6.8, 0.89, 2.87, 1285],
    [12.37, 0.94, 1.36, 10.6, 88, 1.98, 0.57, 0.28, 0.42, 1.95, 1.05, 1.82, 520]
]

Conclusions

The three models demonstrate a high F1-score, suggesting potential overfitting due to the limited dataset size. This overfitting can be addressed by increasing the amount of data to enable better generalization of the models to new samples. Additionally, regularization techniques and the exploration of simpler models can help mitigate the risk of overfitting by preventing the models from learning noise and focusing on capturing underlying patterns more effectively.

The model with the best performance was XGBoost, achieving an F1-score of 0.98 and an AUC-ROC multiclass of 0.99.

Resources

Folder Structure

The project directory structure is organized as follows:

├── README.md
├── data
│ └── wine.csv
├── notebooks
│ ├── wine_classification.ipynb
├── models
│ ├── xgboost
│ ├── svm
│ └── random_forest
└── requirements.txt

Code

The code for this project is in Jupyter notebooks:

wine_classification.ipynb: Descriptive analysis of the data. Model training, evaluation, and optimization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Classification of Grape Varieties in Wine

Table of Contents

Introduction

Data Description

Descriptive Analysis

Data Preprocessing

Model Training

Model Evaluation

Hyperparameter Optimization

Predictions

Conclusions

Resources

Folder Structure

Code

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
models		models
notebooks		notebooks
README.md		README.md

scardonac/wines

Folders and files

Latest commit

History

Repository files navigation

Classification of Grape Varieties in Wine

Table of Contents

Introduction

Data Description

Descriptive Analysis

Data Preprocessing

Model Training

Model Evaluation

Hyperparameter Optimization

Predictions

Conclusions

Resources

Folder Structure

Code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages