Predict

Lightweight prediction model (AUC 0.77 from just 800 rows of data)

Predict does:

exploratory data analysis
feature engineering
predictive modeling

Installation

With pip, run:

pip install predict

How to get started

git clone https://github.com/melvynkim/predict.git
cd predict
pip install -r requirements.txt
py.test tests

Getting Started

For rich visualizations, run Predict from a Jupyter notebook.

For classification, use:

%matplotlib inline

import predict

pd = predict.Classifier(
    train_data='train.csv',
    test_data='test.csv',
    target_col='Survived',
    id_col='PassengerId')

pd.analyze()
pd.model()

For regression, use the predict.Regressor class.

Tip: To prevent scrolling in notebooks, select Cell > Current Outputs > Toggle Scrolling.

Features

There are two primary methods:

analyze runs exploratory data analysis
model builds and evaluates different models

Optionally pass test data if you want to generate a CSV file with predictions.

Data

Data can be a file

predict.Classifier(train_data='train.csv', ...)

Or a data frame

train_df = pd.read_csv('train.csv')

# do preprocessing
# ...

predict.Classifier(train_data=train_df, ...)

Specify datetime columns with:

predict.Classifier(datetime_cols=['created'], ...)

Evaluation

Predict has support for a number of eval metrics.

Classification

accuracy - # correct / total (default)
auc - area under the ROC curve
mlogloss - multi class log loss

Regression

rmse - root mean square error (default)
rmsle - root mean square logarithmic error

Specify an eval metric with:

predict.Classifier(eval_metric='mlogloss', ...)

Modeling

Predict builds and compares different models. Currently, it uses:

boosted trees
simple benchmarks (mode for classification, mean and median for regression)

XGBoost is required for boosted trees. Install it with:

pip install xgboost

Performance

Dataset	Eval Metric	v0.1	Current
House Prices	RMSLE	0.14069	0.13108
Rental Listing Inquiries	Multi Class Log Loss	-	0.61861
Titanic	Accuracy	0.77512	0.77512

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
demos		demos
predict		predict
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predict

Installation

How to get started

Getting Started

Features

Data

Evaluation

Modeling

Performance

About

Uh oh!

Releases

Packages

Languages

License

melvynkim/predict

Folders and files

Latest commit

History

Repository files navigation

Predict

Installation

How to get started

Getting Started

Features

Data

Evaluation

Modeling

Performance

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages