ds-project-template

Template for creating ds simple projects

Project Overview

This repository provides a template for creating and organizing data science projects. It includes a basic project structure, essential dependencies, and guidance on setting up a virtual environment for development. The provided script (housing_eda.py) demonstrates a workflow for exploring a real estate dataset using various techniques.

Project Goals

This project aims to:

Clean and prepare the real estate dataset for analysis. This includes handling missing values, correcting data types, and identifying and addressing outliers.
Explore the relationships between different features and the target variable (price). This involves calculating correlations, performing statistical tests, and visualizing key findings.
Identify areas with the highest price premium for waterfront properties. This helps understand the impact of location and waterfront status on pricing.
Analyze temporal patterns in property prices. This reveals how prices fluctuate throughout the year based on seasonality and waterfront status.

Repository Structure

The repository is structured as follows:

EDA.ipynb: The main Python script containing the data analysis code. This script includes:
- Data loading and initial data integrity checks.
- Missing value imputation using various techniques.
- Outlier detection and handling.
- Feature engineering to create relevant new columns.
- Statistical analysis to find correlations, perform ANOVA tests, and calculate effect sizes.
- Visualizations using Matplotlib and Seaborn to highlight key findings.
- A map visualization using Folium to display waterfront price premiums across different locations.
data/: Directory for storing the real estate dataset file (i.e., housing.csv).
GeoJSON/: Directory for storing the Seattle GeoJSON data used for the map visualization.
requirements.txt: A file containing the list of dependencies required for the project.
README.md: This file, providing an overview of the repository.

Requirements

pyenv for managing Python versions (optional but recommended)
python==3.11.3 the specific Python version used in the project
Node.js required for using Plotly and Jupyter Lab

Setup

1. Set up Python Environment:

Install pyenv (Optional): If you don't have pyenv, follow the instructions at https://github.com/pyenv/pyenv.
Install Python 3.11.3:
```
pyenv install 3.11.3
pyenv local 3.11.3
```
Create a virtual environment:
```
python -m venv .venv
```

Activate the virtual environment:

source .venv/bin/activate  # macOS/Linux
.venv\Scripts\Activate.ps1 # Windows PowerShell
.venv\Scripts\activate     # Windows Git-Bash

Upgrade pip:
```
pip install --upgrade pip
```
Install dependencies:
```
pip install -r requirements.txt
```

2. Install Node.js:

Check Node version:
```
node -v
```
If Node is not installed, follow the instructions below.
Install Node.js using Homebrew (macOS):
```
brew update
brew install node
```

Install Node.js using Chocolatey (Windows):

choco upgrade chocolatey
choco install nodejs

Running the Script

Replace the placeholder data/housing.csv with your own real estate dataset.
Run the script from your terminal:
```
python EDA.ipynb
```

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request for bug fixes, feature enhancements, or improvements to the analysis workflow.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
GeoJSON		GeoJSON
data		data
optional		optional
.gitignore		.gitignore
1_Fetching_the_data_eda.ipynb		1_Fetching_the_data_eda.ipynb
EDA.ipynb		EDA.ipynb
LICENSE		LICENSE
README.md		README.md
assignment.md		assignment.md
column_names.md		column_names.md
percentage_difference_binned_geojson_with_properties.html		percentage_difference_binned_geojson_with_properties.html
requirements.txt		requirements.txt
workflow.md		workflow.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ds-project-template

Project Overview

Project Goals

Repository Structure

Requirements

Setup

Running the Script

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

neuroth-v/EDA_project

Folders and files

Latest commit

History

Repository files navigation

ds-project-template

Project Overview

Project Goals

Repository Structure

Requirements

Setup

Running the Script

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages