This project implements a hybrid recommendation system for personalized content recommendations based on user interactions, preferences, and collaborative filtering techniques. The API provides endpoints to fetch recommended posts based on username, category, and mood.
Ensure you have the following installed:
- Python 3.7 or above
- Flask
requests
library
- Clone the repository:
git clone https://github.com/Anas255-exe/Video_recommendation-.git
cd Video_recommendation-
-
Install dependencies:
pip install -r requirements.txt
-
Run the Flask app:
python app.py
-
Access the API at
http://127.0.0.1:5000
.
-
Fetch Recommended Posts by Username, Category, and Mood
- Endpoint:
/feed
- Method:
GET
- Parameters:
username
(string): Username of the user.category_id
(optional, string): Category filter.mood
(optional, string): Mood filter.
- Example Request:
curl "http://127.0.0.1:5000/feed?username=kinha&category_id=1&mood=happy"
- Endpoint:
-
Fetch Recommended Posts by Category
- Endpoint:
/feed/category
- Method:
GET
- Parameters:
username
(string): Username of the user.category_id
(string): Category filter.
- Example Request:
curl "http://127.0.0.1:5000/feed/category?username=kinha&category_id=1"
- Endpoint:
-
Fetch Recommended Posts by Mood
- Endpoint:
/feed/mood
- Method:
GET
- Parameters:
username
(string): Username of the user.mood
(string): Mood filter.
- Example Request:
curl "http://127.0.0.1:5000/feed/mood?username=kinha&mood=excited"
- Endpoint:
- Input Stage:
-
Collects user interactions including:
- Viewed Posts
- Liked Posts
- Inspired Posts
- Rated Posts
-
-
Content-Based Filtering:
- Filters posts based on:
- Categories of interest.
- Mood preferences.
- Filters posts based on:
-
Collaborative Filtering:
-
Hybrid Recommendation Engine:
-
Output:
- Returns the top 10 recommended posts.
During the development of this recommendation system, the initial approach was to implement the K-Nearest Neighbors (KNN) algorithm to provide video recommendations. This approach was chosen because of the simplicity and effectiveness of KNN in identifying patterns from user preferences based on ratings or features.
KNN Approach
KNN operates by calculating the similarity (or distance) between users (or items), identifying the nearest neighbors, and using their preferences or ratings to predict ratings for unseen items. However, during the implementation, a few challenges arose:
-
Model Selection: Initially, I tried using KNN as a recommendation model. However, I encountered difficulties in adjusting the model and processing the data effectively. This led to exploring alternative approaches like content-based or collaborative filtering methods.
-
Data Inconsistencies: One major issue encountered was the inconsistency in variable names used across different datasets. Specifically, the variable
user_id
was sometimes referred to asid
, which caused problems during data merging and distance computation for KNN. Resolving this inconsistency was critical to the successful implementation of the model. -
Similarity Calculations: KNN requires the computation of similarities between users or items. Initially, I used Euclidean Distance to measure the similarity between users. Later, I considered switching to Cosine Similarity for better performance, especially in sparse datasets, as it accounts for the direction of ratings rather than the magnitude.
Equations Attempted in KNN
The following key equations were part of the KNN implementation and were used to calculate distances and predict ratings:
- Euclidean Distance
Euclidean Distance is a standard method used to calculate the similarity between two data points (users or items). The formula is:
d(x, y) = √(∑(xi - yi)²)
Where:
- x and y are data points (user/item feature vectors).
- n is the number of features/items.
- xi and yi are the values of the i-th feature.
- KNN Prediction Formula
The prediction for a user u on an item i is calculated as:
r̂(u,i) = (∑(v ∈ N_k(u)) sim(u,v) * r(v,i)) / (∑(v ∈ N_k(u)) sim(u,v))
Where:
- r̂(u,i) is the predicted rating for user u on item i.
- N_k(u) is the set of k nearest neighbors of user u.
- sim(u,v) is the similarity between user u and user v.
- r(v,i) is the actual rating given by user v for item i.
- Cosine Similarity
An alternative to Euclidean distance is Cosine Similarity, which calculates the cosine of the angle between two vectors. The formula is:
sim(u,v) = (∑(r(u,i) * r(v,i))) / (√(∑(r(u,i)²)) * √(∑(r(v,i)²)))
Where:
- r(u,i) and r(v,i) are the ratings given by users u and v for item i.
- n is the number of items.
- Weighted Average Prediction
For better accuracy, a weighted average of neighbors' ratings can be used instead of a simple average:
r̂(u,i) = (∑(v ∈ N_k(u)) sim(u,v) * r(v,i)) / (∑(v ∈ N_k(u)) |sim(u,v)|)
Where:
- The numerator is the weighted sum of ratings from the nearest neighbors.
- The denominator normalizes the weights.
Problems Faced: Inconsistencies in Variable Naming One of the main issues I encountered was the inconsistency in variable names, particularly with user_id. In some places, it was referred to as id, and in other places, it was referred to as user_id. This inconsistency caused errors during data processing and model training, as the variables were not properly mapped. I had to carefully refactor the code and standardize the naming convention across the entire project.
Thanks for taking my application into consideration Feel free to reach out with any questions or feedback!