8000 GitHub - Vigrel/BigDataProject
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Vigrel/BigDataProject

Repository files navigation


Exploratory Data Analysis of Climate and Land-Use Data 🌍

Python Dask Machine Learning License

This repository contains the code and documentation for a project exploring the relationship between climate change, land-use practices, and natural disasters. The study emphasizes Brazil while leveraging global datasets to provide insights into disaster trends and environmental factors.

📋 Summary

  • Objective: Analyze trends and patterns of natural disasters in relation to environmental factors.
  • Datasets:
  • Techniques:
    • Data cleaning and normalization
    • Exploratory data analysis (descriptive statistics and visualizations)
    • Predictive modeling using machine learning
    • Regional analysis focusing on Brazil

🛠 Project Structure

├── data/
│   ├── raw/                # Raw datasets
│   ├── interim/            # Intermediate datasets
│   └── processed/          # Final processed datasets
├── main.py                 # Main script for data processing and analysis
├── ANALISE_EXPLORATORIA.ipynb  # Jupyter Notebook for analysis and visualizations
├── Exploratory Data Analysis of Climate and Land-Use Data.pdf  # Final report
└── README.md               # Project documentation

🚀 How to Run

Prerequisites

  • Python 3.9+
  • Libraries: Dask, Pandas, NumPy, Matplotlib, Seaborn

Steps

  1. Clone this repository:

    git clone https://github.com/your-username/your-repository.git
    cd your-repository
  2. Install the dependencies:

    pip install -r requirements.txt
  3. Run the main script:

    python main.py
  4. Explore the results in the output files:

    • Normalized data: data/interim/dataConcat_silver.csv
    • Processed data: data/processed/dataConcat_gold.csv

Optional: Analyze in Jupyter Notebook

Open the ANALISE_EXPLORATORIA.ipynb file to explore the analysis and visualizations interactively.

📝 Key Findings

  • Increasing Disaster Frequency: A clear trend of increasing natural disasters was observed, with a rate of 4.09 events/year (R²=0.37, p=0.0003).
  • Brazil Focus: The analysis identified deforestation rates and forest area as critical predictors for temperature changes.
  • Best Predictive Model: Random Forest achieved the best R² score with minimal prediction error.

🧠 Conclusions

The findings emphasize the importance of regional environmental policies and climate resilience strategies. Data science plays a crucial role in deriving actionable insights for sustainable decision-making.

👥 Contributors

💻 Languages Used

Jupyter Notebook Python

⚙️ Suggested Workflows

Based on the tech stack, the following workflows are recommended:


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  
0