8000 GitHub - the-yanqi/DS1001project: NYC Business Longevity Analysis
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

the-yanqi/DS1001project

Repository files navigation

NYC Business Longevity Analysis

Dataset

Project Goal

Our project aims to

  1. predict the lifespan of a business;
  2. understand how different factors impact the lifespan

from the two data sets above.

Jupyter Notebooks

Table of Contents


  • Load Raw Data
  • Clean Raw Data
    • Clean raw business data
      • Remove non-NYC businesses
      • Keep columns that are relevant to the problem of interest
      • Check selection bias
      • Drop rows with NaN values
    • Clean raw nyc data
      • Only keep columns that are relevant to the problem of interest.
      • Create more useful features from current columns
    • Merge two dataframe by column ZIP
  • Feature Processing
    • One-hot encode on industry type
    • Create Target Variables
    • Generate clean data for modeling
  • Baseline: Simple Regression
  • Model 1: Kaplan-Meier Estimate
  • Model 2: Cox Proportional Hazards regression model
  • Model 3: Multiclass Regression
    • Algorithms: Random Forest, Decision Tree, K-nearest Neighbors, Logistic Regression
    • Metrics: Confusion Matrix, Precision, Recall, F1-score

About

NYC Business Longevity Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  
0