- ๐ Project Overview
- Project Structure
- ๐ฎ Future Plans
- ๐ Learned
- ๐ค Contributing
- ๐ค Author
- ๐ Contact
- ๐ชช License
- Author Name: Althaf Muhammad
- Author Contact: Email and Github
- Dataset Source: Kaggle - Housing Price Prediction Data
- Project Name: House Price Prediction
- Objective: Build a Machine Learning model to predict house prices.
- Training Strategy: Batch Training
- Type of ML: Supervised Learning
- Type of Problem: Regression
-
Evaluation Metrics: RMSE, MAE, and
$R^2$ Score - Python Version: 3.12.10
-
Python Dependency Manager:
uv
(v0.7.2) - Hardware: No hardware limitations. This is a very lightweight and simple project that is runnable on any modern machine, including local setups, Google Colab, and Kaggle.
- OS: This project was initially developed on a Debian Devcontainer machine. I tried to maintain this project OS independent. But there is no guarantee that it is.
-
Project Configuration and Dependencies: Listed in
pyproject.toml
house-price-pred-ml/ # Root project directory
โโโ .devcontainer/ # Dev Container directory
โ โโโ devcontainer.env # Dev Container environment variables file passed to Docker
โ โโโ devcontainer.json # Dev Container configuration file
โ โโโ Dockerfile # Dockerfile used to build the Dev Container
โ โโโ postCreateCommand.sh # Shell script run after Dev Container build completes
โโโ .git/ # Git version control metadata directory
โโโ .venv/ # Local Python virtual environment directory
โโโ data/ # Directory used to store all dataset-related files
โ โโโ interim/ # Directory for intermediate datasets processed from raw data
โ โ โโโ interim.csv # CSV file containing intermediate processed data
โ โโโ processed/ # Directory for storing train-test split datasets
โ โ โโโ test.csv # CSV file containing the test split of the dataset
โ โ โโโ train.csv # CSV file containing the train split of the dataset
โ โโโ raw/ # Directory for unprocessed raw data files
โ โโโ raw.csv # CSV file containing raw dataset
โโโ logs/ # Directory for storing log files generated during execution
โโโ notebooks/ # Directory used for storing Jupyter notebooks for experimentation
โ โโโ 1-althaf07-experimentation.ipynb # Notebook for initial experimentation and data analysis
โโโ reports/ # Directory for project reports and documentation
โ โโโ figures/ # Directory for saving generated figures and plots
โ โ โโโ univariate/ # Directory for univariate analysis figures
โ โ โโโ bathrooms.png
โ โ โโโ bedrooms.png
โ โ โโโ neighborhood.png
โ โ โโโ numc_describe.md
โ โ โโโ price.png
โ โ โโโ square_feet.png
โ โ โโโ year_built.png
โ โโโ environment.md # Markdown file documenting the environment setup
โ โโโ experiment_document.md # Markdown file detailing experiment results and analysis
โโโ src/ # Source code root directory
โ โโโ house_price_pred_ml/ # Main package containing all project modules
โ โโโ plot/ # Plotting utilities module
โ โ โโโ univariate/ # Module for univariate plot functions
โ โ โโโ cat_numd.py # Plots for categorical vs numerical data distributions
โ โ โโโ numc.py # Plots for numerical data distributions
โ โโโ __init__.py # Initializes the house_price_pred_ml Python package
โ โโโ api.py # Defines API endpoints for model serving
โ โโโ auto_gen_table.py # Automatically generates data summary tables
โ โโโ config.py # Handles configuration loading logic
โ โโโ config.yaml # YAML configuration file for model parameters and settings
โ โโโ evaluate.py # Evaluates model performance metrics
โ โโโ predict.py # Generates predictions using the trained model
โ โโโ process_data.py # Processes raw data into cleaned, usable format
โ โโโ split_data.py # Splits dataset into training and testing sets
โ โโโ train.py # Trains the machine learning model
โ โโโ tree.py # Implements tree-based model utilities
โ โโโ utils.py # Utility functions for data processing and evaluation
โโโ tmp/ # Temporary directory for intermediate or disposable files
โโโ .gitignore # Specifies files and directories ignored by Git
โโโ .pre-commit-config.yaml # Configuration for pre-commit hooks to enforce code standards
โโโ .python-version # Specifies the Python version used in the project
โโโ Dockerfile # Dockerfile for building the project container for deployment
โโโ pyproject.toml # Configuration file for Python dependencies and build system
โโโ README.md # Main project documentation and usage instructions
โโโ uv.lock # Lock file for managing UV-based dependencies