RAVEN: Reducing Attributes Via Evaluating Nearness 🐦‍⬛

An ultra-fast tool to reduce the attributes (features) of that insanely large dataset in a way that doesn't affect dataset quality. It does this by identifying clusters of linearly related (and therefore redundant) features, and only preserving the feature most 'near' to all other features.

Dependencies

Make sure you have Pandas, NumPy and NetworkX installed. You can install these packages using pip

pip install pandas numpy networkx

Usage

To use Raven, you can simply download the raw of raven.py and import it as

from raven import raven

Once you have it imported, you can identify redundant features. Here's an example usage:

really_huge_dataset = pd.read_csv('./really_huge_dataset.csv')

redundant_features = raven(really_huge_dataset)

smaller_dataset = really_huge_dataset.drop(columns=redundant_features)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
README.md		README.md
raven.ipynb		raven.ipynb
raven.py		raven.py
results.svg		results.svg
train1000.csv		train1000.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAVEN: Reducing Attributes Via Evaluating Nearness 🐦‍⬛

Dependencies

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

fringewidth/raven

Folders and files

Latest commit

History

Repository files navigation

RAVEN: Reducing Attributes Via Evaluating Nearness 🐦‍⬛

Dependencies

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages