8000 GitHub - melvynkim/predict
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

melvynkim/predict

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predict

Lightweight prediction model (AUC 0.77 from just 800 rows of data)

See a demo

Predict does:

  • exploratory data analysis
  • feature engineering
  • predictive modeling

Installation

With pip, run:

pip install predict

How to get started

git clone https://github.com/melvynkim/predict.git
cd predict
pip install -r requirements.txt
py.test tests

Getting Started

For rich visualizations, run Predict from a Jupyter notebook.

For classification, use:

%matplotlib inline

import predict

pd = predict.Classifier(
    train_data='train.csv',
    test_data='test.csv',
    target_col='Survived',
    id_col='PassengerId')

pd.analyze()
pd.model()

For regression, use the predict.Regressor class.

Tip: To prevent scrolling in notebooks, select Cell > Current Outputs > Toggle Scrolling.

Features

There are two primary methods:

  • analyze runs exploratory data analysis
  • model builds and evaluates different models

Optionally pass test data if you want to generate a CSV file with predictions.

Data

Data can be a file

predict.Classifier(train_data='train.csv', ...)

Or a data frame

train_df = pd.read_csv('train.csv')

# do preprocessing
# ...

predict.Classifier(train_data=train_df, ...)

Specify datetime columns with:

predict.Classifier(datetime_cols=['created'], ...)

Evaluation

Predict has support for a number of eval metrics.

Classification

  • accuracy - # correct / total (default)
  • auc - area under the ROC curve
  • mlogloss - multi class log loss

Regression

  • rmse - root mean square error (default)
  • rmsle - root mean square logarithmic error

Specify an eval metric with:

predict.Classifier(eval_metric='mlogloss', ...)

Modeling

Predict builds and compares different models. Currently, it uses:

  1. boosted trees
  2. simple benchmarks (mode for classification, mean and median for regression)

XGBoost is required for boosted trees. Install it with:

pip install xgboost

Performance

Dataset Eval Metric v0.1 Current
House Prices RMSLE 0.14069 0.13108
Rental Listing Inquiries Multi Class Log Loss - 0.61861
Titanic Accuracy 0.77512 0.77512

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published
0