-
EDA on rent pricing at NY (New York) boroughs with interactive dashboards, along with development of a ML regression model.
-
If you want to see the deployed application, click down below and feel BD41 free to test the models with your own instances, interact with dynamic dashboards about the dataset or visualize static ones:
-
Python3 and pip package manager:
sudo apt install python3 python3-pip build-essential python3-dev
-
virtualenv tool:
pip install virtualenv
-
Libraries: pandas, scikit-learn, mlxtend, xgboost, lightgbm, Streamlit, Dash, Plotly express, Kaleido, Matplotlib, seaborn, numpy, WordCloud, Cerberus, joblib, gdown;
-
Environments: Jupyter.
In this section, you can see the interactive and static dashboards screens made with Streamlit, as well as the predictor GUI.
-
Clone the repository
git clone https://github.com/juliorodrigues07/ny_rent_pricing.git
-
Enter the repository's directory
cd ny_rent_pricing
-
Create a virtual environment
python3 -m venv .venv
-
Activate the virtual environment
source .venv/bin/activate
-
Install the dependencies
pip install -r requirements.txt
-
You first need to be in the dashboards directory to run the commands.
-
With Streamlit:
streamlit run 1_π _Home.py
-
With Dash Plotly (only dashboard):
python3 dash_test.py
-
-
To visualize the notebooks online and run them (Google Colaboratory), click on the following links:
-
To run the notebooks locally, run the commands in the notebooks directory following the template:
jupyter notebook <file_name>.ipynb
.-
EDA (Exploratory Data Analysis):
jupyter notebook 1_eda.ipynb
-
Preprocessing:
jupyter notebook 2_preprocessing.ipynb
-
Machine Leaning:
jupyter notebook 3_ml_methods.ipynb
-
-
To run python scripts locally, you first need to be in the src directory and then run the command:
python3 main.py
.
βββ README.md # Project's documentation
βββ requirements.txt # File containing all the required dependencies to run the project
βββ plots # Directory containing all the graph plots generated in EDA
βββ assets # Directory containing images used in README.md and in the deployed app
βββ notebooks # Directory containing project's jupyter notebooks
| βββ 1_eda.ipynb
| βββ 2_preprocessing.ipynb
| βββ 3_ml_methods.ipynb
βββ dashboards # Directory containing the web application
| βββ 1_π _Home.py <- Main page with the price predictor
| βββ pages # Child pages directory
| | βββ 2_π_Interactive.py <- Script responsible for generating the interactive dashboards
| | βββ 3_π_Static.py <- Script responsible for generating the static dashboards
| βββ dash_test.py <- Interactive and static dashboards made with Dash library
βββ src # Directory containing all the python scripts for data mining
| βββ main.py <- Main script for evaluating ML models
| βββ datamining # Directory containing scripts responsible for all KDD process
| βββ data_visualization.py
| βββ preprocessing.py
| βββ ml_methods.py
| βββ __init__.py
βββ datasets # Directory containing all used or generated datasets in the project
| βββ pricing.csv <- Original dataset
| βββ reduced.parquet <- Result after applying memory optimizing techniques on the original dataset
| βββ filled.parquet <- Result after inputting missing values in the reduced.parquet dataset
| βββ preprocessed.parquet <- Result after applying preprocessing techniques on the filled.parquet dataset
| βββ feature_selected.parquet <- Final result after applying feature selection on the preprocessed.parquet dataset
βββ models # Directory containing all generated models in the project
βββ lgbm_model.pkl <- LightGBM algorithm fitted model
βββ xgb_model.pkl <- XGBoost algorithm fitted model
βββ histgb_model.pkl <- HistGradientBoosting algorithm fitted model
-
To uninstall all dependencies, run the following command:
pip uninstall -r requirements.txt -y
-
To deactivate the virtual environment, run the following command:
deactivate
-
To delete the virtual environment, run the following command:
rm -rf .venv