最新更新:
原本是采用featuretools+gplearn+autoint,后测试gplearn的SymbolicTransformer这种metric用的是Pearson correlation coefficient,它效果不如专门的Regressor或Classifier效果好,然后直接换成了实测效果更好的EvolutionaryForest来实现。
目前的实现主要逻辑是featuretools + EvolutionaryForest + DCN V2.
原代码不维护了。
This project aims to automate feature engineering to extract efficient features directly from raw data without human involvement.
- First, new features are extracted from raw data based on different field meanings, such as extracting whether it is a holiday or not based on the date.
- Secondly, do meaning correlation of different fields, such as date and consumption, to extract the consumption during the vacation period.
- Further, perform logical combination operations on the features generated earlier, such as adding and dividing two columns of features.
- Finally, extract the features with algorithms, such as autoint and clustering.
This tool uses the packages featuretools, gplearn, autoint, scikit-learn, xgboost.
This project does not include feature preprocessing; preprocessing should be placed in the model training phase, as different models require different preprocessing methods.
# waitting for readthedocs.io