8000 GitHub - JellyBella/ny_taxi: This project explores the New York Yellow Taxi dataset, setting up a comprehensive data engineering environment on-premises, conduct exploratory data analysis, and provide insights and recommendations to enhance the efficiency and service quality of NYC yellow taxis.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

This project explores the New York Yellow Taxi dataset, setting up a comprehensive data engineering environment on-premises, conduct exploratory data analysis, and provide insights and recommendations to enhance the efficiency and service quality of NYC yellow taxis.

Notifications You must be signed in to change notification settings

JellyBella/ny_taxi

Repository files navigation

Data Engineering New York Yellow Taxi

This project explores the New York Yellow Taxi dataset available from the NYC Taxi & Limousine Commission, focusing on the December 2023 Parquet data.

The goal is to set up a comprehensive data engineering environment on-premises, conduct exploratory data analysis, and provide insights and recommendations to enhance the efficiency and service quality of NYC yellow taxis.

Project Motivation

The New York Yellow Taxi service is an integral part of the city's transportation network. This project aims to leverage data engineering practices to uncover insights that could lead to improved taxi services, optimized route management, and better customer satisfaction.

Tech Stack and Tools

Docker Visual Studio Code Jupyter Notebook Python Pandas Git Postgres

  • Software Framework: Docker
  • Database: Postgres
  • Data Analysis & Exploration: SQL/Python
  • Data Visualization: Jupyter Notebook
  • CICD: Git

Setup Instructions

  1. Environment Setup: Clone the repository and ensure Docker is installed on your system.
  2. Database Configuration: Use the provided Docker Compose file to set up a Postgres database container.
  3. Data Ingestion: Run the Python scripts to ingest data into the Postgres database.
  4. Analysis: Open the Jupyter notebooks to start exploring the data and generating insights.

Exploratory Data Analysis (EDA)

Conduct a thorough EDA to uncover any initial insights or patterns in the data, focusing on the questions

  1. What are the peak hours for taxi demand?
  2. How does passenger count vary throughout the day?
  3. What is the average duration of a taxi ride?
  4. Are there any trends in ride durations or distances over time?
  5. How does the taxi usage vary by area?

The insights generated from this analysis could inform strategic decisions to improve taxi efficiency and service quality, such as adjusting fleet sizes during peak hours, optimizing route planning, and tailoring services to meet customer demand more effectively. Please find the complete analysis with an executive summary at the end of each section.

About

This project explores the New York Yellow Taxi dataset, setting up a comprehensive data engineering environment on-premises, conduct exploratory data analysis, and provide insights and recommendations to enhance the efficiency and service quality of NYC yellow taxis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0