Cuisine-Prediction

Introduction

In this notebook, we implement a prediction system that uses recipe-ingredient data from this Kaggle challenge to predict the cuisine of a given recipe
We implement and evaluate techniques from the following domains:
1. Simple (baseline) heuristics
  1. Baseline #1: For each ingredient belonging to the given test ingredient list, find the cuisine in which this ingredient is used the most. Among all such cuisines, pick the most frequently occurring cuisine
  2. Baseline #2: Rank the set of training recipes based on number of ingredients common with test ingredient list. Assign weights to recipes based on their rank and add to scores of their corresponding cuisines. Finally, choose the cuisine with the highest score
2. Machine Learning
  1. Neural Network
  2. Support Vector Machine (SVM)
3. Machine Learning + Network
  1. node2vec embeddings of the unipartite projection network (where each node is an ingredient) of the recipe-ingredient bipartitite network are first obtained
  2. Neural Network, SVM, GRU are then trained with these embeddings as features
4. Network-based (clustering) heuristics
  1. Do clustering of ingredients using K-means on node2vec embeddings of ingredient-ingredient network (we set K = # cuisines). We analyze if these generated clusters have one-to-one correlation with different cuisines. If yes, we can use these clusters to create a prediction heuristic in which the cuisine having most number of test ingredients (i.e. the cluster having most number of nodes out of a given set of nodes) is picked

Installation

Download recipe-ingredient data from Kaggle's challenge (Go to Data tab and click on Download All)
- Unzip and place whats-cooking directory inside the root directory of this project

Repository structure

cuisine-prediction.ipynb notebook contains the full code along with the results obtained
If GitHub is unable to render the above notebook in your browser, you can instead download and see the notebook's equivalent HTML export cuisine_prediction.html (see Exports/ directory)
Submissions/ directory contains predictions of various models on Kaggle's test data (these can be submitted directly on Kaggle)
We split Kaggle's train data into my_train_split.json and my_test_split.json so that we can do more sophisticated analysis of results rather than just analyzing accuracy (this is necessary since ground truth of Kaggle's test data is not available)
embeddings/ contains files that hold node2vec embeddings of the nodes of the my_train_split network

Results summary

Note: See the notebook for in-depth analysis of results

Model	Accuracy	Weighted F1-score	Unweighted F1-score
Baseline #1	53.2	0.459	0.268
Baseline #2	40.92	0.319	0.140
Baseline #2(b)	52.97	0.450	0.249
NN (1-hot)	77.82	0.776	0.701
SVM (1-hot)	76.71	0.763	0.684
NN (embedding)	72.41	0.714	0.600
SVM (embedding)	69.25	0.660	0.489
GRU (embedding)	65.11	0.619	0.441
Clustering heuristic	40.03	0.382	0.246

References

What's Cooking challenge at Kaggle
node2vec: Scalable Feature Learning for Networks by Aditya Grover and Jure Leskovec
Available implementation of node2vec by Aditya Grover

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cuisine-Prediction

Introduction

Installation

Repository structure

Results summary

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Exports		Exports
Submissions		Submissions
embeddings		embeddings
README.md		README.md
cuisine-prediction.ipynb		cuisine-prediction.ipynb
my_test_split.json		my_test_split.json
my_train_split.json		my_train_split.json

hasan-kamal/Cuisine-Prediction

Folders and files

Latest commit

History

Repository files navigation

Cuisine-Prediction

Introduction

Installation

Repository structure

Results summary

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages