Amazonstarprediction

Predicted review ratings using users’ reviews from Amazon Appstore dataset using NLTK and Sklearn

Training data contains a table with 670839 unique reviews for various paid apps from Amazon Appstore, with their associated star ratings and metadata. Testing data contains a table with another 100000 unique reviews. The format of the table is exactly the same as train_data.csv, except the field of 'star_rating' is missing. You are required to predict the star ratings of these reviews using the metadata provided in this table.

Project approaches:

There are 3 models implemented in this project:

Without using training data, clustering only in test data (only use feature of review):

Used wordnet get the positive and negative score of each word for each review, and form them as a vector of (positive score, negative score)
Calculated score = arctan(positive score/negative score)-pi/4 as final score
Appled K-means++ to all scores.
Asigned each view to 5 centers to predict star rate.

2.Clustering in test data with centers get by using training data (only use feature of review):

Extracted review with star rating 1, .., 5 from training data.
Performed TF-IDF for both training data and testing data.
Calculated centorid of each cluster in training data.
Asigned each testing data to these centorid to predict star rate in testing data.

3.Clustering in test data by using centers calculated by ‘star rating’ dictionary (only use feature of review):

Extracted review with star rating 1, .., 5 from training data.
Performed TF-IDF for both training data and testing data and get feature words.
Designed a score model (for each word in each view, score += i - k where k is prior) which for feature words in training data.
Get 5 center scores of training data.
Applied these center scores in feature words from testing data to predict star rate.

Model 2 performed the best while model 3 got the worst accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.ipynb_checkpoints		.ipynb_checkpoints
README.md		README.md
project1_1.ipynb		project1_1.ipynb
project1_2.ipynb		project1_2.ipynb
project1_3.ipynb		project1_3.ipynb
result.csv		result.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Amazonstarprediction

About

Uh oh!

Releases

Packages

Uh oh!

Languages

whitenoiseL/Amazonstarprediction

Folders and files

Latest commit

History

Repository files navigation

Amazonstarprediction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages