Loan Application Approvement Project

The objective of this project is to classify whether a applicant will be approved for loan or not. In this project, we will use some classification algorithms and tune the hyperparameters to improve the performance. We will also utilize a pipeline to integrate some of preprocessing and modeling steps.

Deployment

Data Set

Obtained from Kaggle: link

This dataset contains personal characteristics and loan status of applicants respectively, with 11 predictor features and 1 target features. Of the 614 instances in the dataset, 68.73% (422) were positive class that approved to loan, and the rest (192) were negative class samples ending with unapproved.

The dataset consists of 4 numerical and 8 categorical features. The 'Loan_Status' feature was used as the class label. Each instance represent for one applicant.

Insights

Most of applicants without credit history was not approved.
For each applicant, the higher income of applicant and/or coapplicant, the higher positive rate.
The applicants with urban and semiurban tended to be approved more than the rural ones.
Loan amount inversely proportional to loan status.

Recommendation

Credit history and income are the most affected factors. Therefore, it might be able to set a threshold for income and combine with the sredit status for faster screening.
With applicants from different regions, it might be able to advise them for more suitable loan packages.

Cleaning

In our dataset, 5 features have null values, in which 4 can be filled with median for numerical features and most frequent class for categorical features. Except for credit score, we used classification algorithm to predict null cases with charateristics respectively.

Preprocessing

Convert target variable to numerical format.
Fill null values using simple imputer.
Apply log transformation for numerical features.
Feature encoding for categorical and numerical features using one-hot encoding and standard scaler.
Split the dataset into 80% training, 20% tesing.

Evaluation

The evaluation metrics that will be the main concern in this project are weighted F1 score. The reason behind this decision is due to the nature of the data that is quite small and has an imbalance class, thus the accuracy won’t represent the model's actual performance.

Model Analysis

Amongst the classifier, Logistic Regression performed the best. The performance also seems to be okay on the new data since this model has an F1-score of around 78% on test set. Despite of its simplicity, this model has learned the rules underlying the data and also becomes less overfitted towards the training set after hyperparameter tuning has been done. Even so, there is some room for improvement towards this project, we can try another sampling strategy or more hyperparameters to be specific to have more improvement in the training and validation performances.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
artifacts		artifacts
data		data
notebooks		notebooks
src		src
templates		templates
.gitignore		.gitignore
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
application.py		application.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Loan Application Approvement Project

Deployment

Data Set

Insights

Cleaning

Preprocessing

Evaluation

Model Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Languages

dhuydo/LoanPrediction

Folders and files

Latest commit

History

Repository files navigation

Loan Application Approvement Project

Deployment

Data Set

Insights

Cleaning

Preprocessing

Evaluation

Model Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages