- check out the NEW web app here https://huggingface.co/spaces/dcrey7/Fifa19_webapp
- youtube link https://www.youtube.com/watch?v=_nmrCul-JOw
This web application helps football managers and scouts make data-driven decisions by providing player recommendations and market value predictions using FIFA 19 dataset (https://www.kaggle.com/datasets/javagarm/fifa-19-complete-player-dataset). The system combines machine learning algorithms (KNN for player recommendations and XGBoost for bid predictions) to deliver actionable insights.
Football clubs face several challenges in player recruitment:
- Finding similar players to replace departing team members
- Identifying undervalued talents in the market
- Making informed decisions about player valuations
- Optimizing recruitment budget allocation
This application addresses these challenges by providing:
- Data-driven player recommendations based on similar playing styles and attributes
- Market value predictions to assist in negotiations
- Comprehensive player statistics for informed decision-making
- Utilizes K-Nearest Neighbors (KNN) algorithm
- Considers 30+ player attributes including:
- Technical skills (dribbling, shooting, passing)
- Physical attributes (pace, strength, stamina)
- Mental attributes (vision, positioning, composure)
- Recommends similar players based on playing style and attributes
- Helps identify alternative recruitment targets
-
Powered by XGBoost regression model
-
Predicts market value based on:
- Player statistics and attributes
- Age and potential
-
Helps in budget planning and negotiations
-
Identifies potentially undervalued players
- Flask-based web server
- Routes handling
- API endpoints implementation
- Model inference integration
Data Flow:
Raw Data → Preprocessing → Feature Engineering → Model Training → Serialization
Recommendation System (KNN)
└── Input → Feature Scaling → KNN Model → Similarity Scores → Top N Recommendations
Bid Prediction (XGBoost)
└── Input → Preprocessing → Feature Engineering → XGBoost Model → Value Prediction
- Python 3.8+
- Flask 1.1.2
- NumPy 1.19.5
- Pandas 1.2.4
- Scikit-learn 0.24.2
- XGBoost 1.4.2
- HTML5
- CSS3
- JavaScript
- Bootstrap 4.5
- Git for version control
- Jupyter Notebooks for model development
- VS Code/PyCharm for development
- Heroku Platform
- Gunicorn web server
- Removed special characters from monetary values (€, M, K)
- Converted height and weight to numerical values
- Standardized player positions
- Handled missing values using mean/median imputation
- Converted categorical variables using one-hot encoding
- Normalized numerical features using StandardScaler
- Created composite skill metrics
- Generated position-specific attributes
- Calculated age-based potential indicators
- Engineered monetary features (wage-to-value ratio)
- Aggregregating team level and position wise features
- Created physical attribute indices
- Removed highly correlated features (correlation threshold > 0.85)
- Used feature importance from Random Forest to select top predictors
- Applied domain knowledge to retain crucial football attributes
- Algorithm: K-Nearest Neighbors
- Distance Metric: Euclidean Distance (performed better than Manhattan)
- K Value: 5 (determined through cross-validation)
- Weights: Distance-weighted voting
- Standardization of all numerical features
- Features grouped into categories:
- Technical Abilities (dribbling, shooting, passing, etc.)
- Physical Attributes (pace, strength, stamina)
- Mental Attributes (vision, positioning)
- Position-specific metrics
- Normalized Euclidean distance
- Custom weighting for position-specific attributes
- Scaled similarity scores (0-100)
- Training Set: 80% (14,565 players)
- Testing Set: 20% (3,642 players)
- Stratified split based on player overall rating
-
Linear Regression
- Baseline model
- R² Score: 0.72
- RMSE: 0.1543
-
Random Forest
- n_estimators: 100
- max_depth: 15
- R² Score: 0.85
- RMSE: 0.1123
-
XGBoost (Final Model)
- Best performing model
- Hyperparameters:
{ 'learning_rate': 0.1, 'max_depth': 5, 'min_child_weight': 1, 'n_estimators': 200, 'subsample': 0.8, 'colsample_bytree': 0.8, 'gamma': 0.1 }
- R² Score: 0.89
- RMSE: 0.0891
- MAE: 0.0654
- 5-fold cross-validation
- Stratified K-Fold for maintaining player rating distribution
- Grid Search CV for hyperparameter tuning
- Overall Rating (0.285)
- Potential (0.156)
- Age (0.098)
- International Reputation (0.087)
- Skill Moves (0.076)
- Weak Foot (0.065)
- Position (0.058)
- Composure (0.045)
- Reactions (0.042)
- Ball Control (0.038)
- Silhouette Score: 0.68
- Davies-Bouldin Index: 0.42
- Position Accuracy: 92%
- Playing Style Similarity: 85%
- Mean Absolute Percentage Error (MAPE): 8.76%
- R² Score: 0.89
- RMSE: 0.0891
- MAE: 0.0654
- Explained Variance Score: 0.892
- Models saved using pickle format
- Heoruko free tier