8000 GitHub - fringewidth/raven: A graph based alternative to PCA for feature selection
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fringewidth/raven

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAVEN: Reducing Attributes Via Evaluating Nearness 🐦‍⬛

image

An ultra-fast tool to reduce the attributes (features) of that insanely large dataset in a way that doesn't affect dataset quality. It does this by identifying clusters of linearly related (and therefore redundant) features, and only preserving the feature most 'near' to all other features.

Dependencies

Make sure you have Pandas, NumPy and NetworkX installed. You can install these packages using pip

pip install pandas numpy networkx

Usage

To use Raven, you can simply download the raw of raven.py and import it as

from raven import raven

Once you have it imported, you can identify redundant features. Here's an example usage:

really_huge_dataset = pd.read_csv('./really_huge_dataset.csv')

redundant_features = raven(really_huge_dataset)

smaller_dataset = really_huge_dataset.drop(columns=redundant_features)

About

A graph based alternative to PCA for feature selection

Topics

Resources

Stars

Watchers

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  
0