IBM Applied Data Science Capstone - The Battle of the Neighborhoods

This project is part of the IBM Data Science Professional Certificate on Coursera. It focuses on analyzing data from the Greater Taipei area to identify the best locations for opening a new venue, such as a restaurant, based on factors like population density, economic diversity, and existing venues.

Project Overview

Taipei is one of the most densely populated cities in the world, and it presents significant business opportunities for new venues. The goal of this project is to explore the best districts in Greater Taipei for opening a venue by using data analysis and machine learning techniques. The project involves determining which districts have the highest population density, economic diversity, and business potential.

Key questions explored:

Which districts have the highest population density?
Which districts have the best economic diversity?
Which districts have the most potential for new businesses?

Files and Resources

The repository includes the following files:

README.md: This file contains the project description and setup instructions.
The Battle of the Neighborhoods I.ipynb: The Jupyter notebook for Week 1, where data collection and initial exploration were done.
The Battle of the Neighborhoods II.ipynb: The Jupyter notebook for Week 2, focusing on data processing, visualization, and clustering.
The Battle of the neighborhoods.pdf: The final report summarizing the project findings and analysis.
The Battle of the neighborhoods_PPT.pdf: A presentation summarizing the results of the analysis.

Data Collection

The data used in this project includes:

Geographic Data: Collected from Taiwan’s government data platform and Wikipedia for municipal information.
Venue Data: Retrieved via the Foursquare API to analyze existing venues in the districts.
Demographic Data: Scraped and processed from Wikipedia and government platforms.

Methodology

Data Cleaning: Using Python's pandas library to clean and organize the demographic data.
Population Density Analysis: Population density was calculated for each district, and geographic data was visualized using the folium library.
Venue Data Analysis: Venue data was collected from Foursquare API, categorized, and analyzed using K-means clustering to identify areas with diverse economic activity.
Clustering: K-means clustering was applied to group the districts into 8 clusters based on their venue types. The elbow method helped determine the optimal number of clusters.

Results

The project identified several key findings:

Yonghe and Daan were found to have the highest population density.
Banqiao and Daan exhibited a diverse mix of economic activities, making them ideal for new businesses.
The clustering analysis highlighted 8 distinct groups of districts based on the types of venues.

Technologies Used

Python
Jupyter Notebooks
Pandas
Folium
GeoJSON
Foursquare API
K-means Clustering (Scikit-learn)

Installation

To run this project locally, follow these steps:

Clone the repository:

git clone https://github.com/boba-milktea/Coursera_Capstone.git

Install the required Python libraries:

pip install pandas folium geopy requests sklearn

Open the Jupyter notebooks to explore the project:
```
jupyter notebook
```
Run the notebooks to see the analysis and results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IBM Applied Data Science Capstone - The Battle of the Neighborhoods

Table of Contents

Project Overview

Files and Resources

Data Collection

Methodology

Results

Technologies Used

Installation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
README.md		README.md
The Battle of the Neighborhoods I.ipynb		The Battle of the Neighborhoods I.ipynb
The Battle of the Neighborhoods II.ipynb		The Battle of the Neighborhoods II.ipynb
The Battle of the neighborhoods.pdf		The Battle of the neighborhoods.pdf
The Battle of the neighborhoods_PPT.pdf		The Battle of the neighborhoods_PPT.pdf

boba-milktea/Coursera_Capstone

Folders and files

Latest commit

History

Repository files navigation

IBM Applied Data Science Capstone - The Battle of the Neighborhoods

Table of Contents

Project Overview

Files and Resources

Data Collection

Methodology

Results

Technologies Used

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages