8000 GitHub - scardonac/wines
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

scardonac/wines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Classification of Grape Varieties in Wine

Table of Contents

Introduction

This project aims to develop a model to classify grape varieties in wine (A, B, or C) using machine learning techniques. The model is trained using a dataset from Kaggle.

Data Description

The dataset contains 534 examples of wine, each with 13 features:

  • Alcohol
  • Malic acid
  • Ash
  • Alcalinity of ash
  • Magnesium
  • Total phenols
  • Flavanoids
  • Nonflavanoid phenols
  • Proanthocyanins
  • Color intensity
  • Hue
  • OD280/OD315 of diluted wines
  • Proline

The target variable is the wine class (A, B, or C).

Descriptive Analysis

Descriptive analysis of the data was performed to understand its distribution and correlations. Distribution plots, histograms, and correlation matrices were generated.

Data Preprocessing

Variables were normalized using standard scaling. Principal Component Analysis (PCA) was applied to reduce dimensionality and eliminate multicollinearity.

Model Training

Three machine learning models were trained:

  • XGBoost
  • SVM
  • Random Forest

Model Evaluation

The models were evaluated using the following metrics:

  • F1-score
  • AUC-ROC multiclass
  • Precision
  • Recall

Hyperparameter Optimization

Model hyperparameters were optimized using the Optuna library.

Predictions

Predictions of grape variety were generated for new wine examples.

new_data = [
    [13.72, 1.43, 2.5, 16.7, 108, 3.4, 3.67, 0.19, 2.04, 6.8, 0.89, 2.87, 1285],
    [12.37, 0.94, 1.36, 10.6, 88, 1.98, 0.57, 0.28, 0.42, 1.95, 1.05, 1.82, 520]
]

Conclusions

The three models demonstrate a high F1-score, suggesting potential overfitting due to the limited dataset size. This overfitting can be addressed by increasing the amount of data to enable better generalization of the models to new samples. Additionally, regularization techniques and the exploration of simpler models can help mitigate the risk of overfitting by preventing the models from learning noise and focusing on capturing underlying patterns more effectively.

The model with the best performance was XGBoost, achieving an F1-score of 0.98 and an AUC-ROC multiclass of 0.99.

Resources

Folder Structure

The project directory structure is organized as follows:

├── README.md
├── data
│ └── wine.csv
├── notebooks
│ ├── wine_classification.ipynb
├── models
│ ├── xgboost
│ ├── svm
│ └── random_forest
└── requirements.txt

Code

The code for this project is in Jupyter notebooks:

  • wine_classification.ipynb: Descriptive analysis of the data. Model training, evaluation, and optimization.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0