8000 GitHub - Vevesta/VevestaX at EDA_extended
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Vevesta/VevestaX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VevestaX

image

Downloads Downloads Downloads License Twitter URL

Track failed and successful Machine Learning experiments as well as features.

VevestaX is an open source Python package for ML Engineers and Data Scientists. It includes modules for tracking features sourced from data, feature engineering and variables. The output is an excel file which has tabs namely, data sourcing, feature engineering and modelling. The library can be used with Jupyter notebook, IDEs like spyder or while running the python script through command line. VevestaX is framework agnostic. You can use it with any machine learning or deep learning framework.

How to install the library:

pip install vevestaX

How to import a library and create the object

#import the vevesta Library
from vevestaX import vevesta as v
V=v.Experiment()

How to extract features present in input data.

image Code snippet:

#read the dataset
import pandas as pd
df=pd.read_csv("salaries.csv")
df.head(2)

#Extract the columns names for features
V.ds=df
# you can also use:
#   V.dataSourcing = df

#Print the feature being used
V.ds

How to extract engineered features

image

Code snippet

#Extract features engineered
V.fe=df  
# you can also use:
V.featureEngineering = df

#Print the features engineered
V.fe

How to track variables used in the code.

V.start() and V.end() form a code block and can be called multiple times in the code to track variables used within the code block. Any technique such as XGBoost, decision tree, etc can be used within this code block. image Code snippet:

#Track variables which have been used for modelling
V.start()
# you can also use: V.startModelling()


# All the variables mentioned here will be tracked
epochs=100
seed=3
loss='rmse'


#end tracking of variables
V.end()
# or, you can also use : V.endModelling()

How to dump the features and modelling variables in an given excel file

image Code snippet:

# Dump the datasourcing, features engineered and the variables tracked in a xlsx file
V.dump(techniqueUsed='XGBoost',filename="vevestaDump1.xlsx",message="XGboost with data augmentation was used",version=1)

Alternatively, write the experiment into the default file, vevesta.xlsx image Code snippet:

V.dump(techniqueUsed='XGBoost')

A sample output excel file has been uploaded on google sheets. Its url is here

Output snapshots

Sourced Data tab

image

Feature Engineering tab

image

Modelling tab

image

Messages tab

image

EDA-correlation tab

image

Experiments performance plots

image image

If you liked the library, please give us a github star and retweet .

For additional features, explore our tool at Vevesta . For comments, suggestions and early access to the tool, reach out at vevestax@vevesta.com

Looking for beta users for the library. Register here

We at vevesta Labs are maintaining this library and we welcome feature requests. Find detailed blog on the vevestaX on Medium

0