8000 best iteration and feature importance information is lost after serialization · Issue #2294 · catboost/catboost · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

best iteration and feature importance information is lost after serialization #2294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Nevermetyou65 opened this issue Feb 9, 2023 · 2 comments
Labels

Comments

@Nevermetyou65
Copy link
Nevermetyou65 commented Feb 9, 2023

Hi
I am really new to catboost. I would like some clarification about saving/loading model.
The way I save/load model, I did it as in here : https://catboost.ai/en/docs/concepts/python-usages-examples#load-from-file

I saved trained model using .cbm format. The problem is when I a load model back and I want to call model.get_best_iteration() it return None. model.feature_importance_ also returned None. Here is my code

params = {
    'iterations' : 10000, 
    'early_stopping_rounds' : 1000,
    'loss_function' : 'Logloss',
    'eval_metric' : 'F1',
    'task_type' : 'GPU',
    'gpu_ram_part' : 0.3,
    'train_dir' : 'model/catboost_info',
}
model = CatBoostClassifier(**params)
print('fit the model...')

start = time.time()
model.fit(
    X=train_data,
    use_best_model=True,
    eval_set=valid_data,
)
train_time = time.time() - start

print('done training...')
print('save the model and train time..')
model.save_model('model/catboost.cbm')

print('load model and train time...')

cat = CatBoostClassifier()
cat.load_model('model/catboost.cbm')
with open('dump/catboost_train_time.pickle', 'rb') as f:
    train_time = pickle.load(f)
    
print('finish..')
print(f'train time : {train_time / 60}')
print(f'last iteration: {cat.get_best_iteration()}')
@andrey-khropov
Copy link
Member

get_best_iteration() and other metrics-related information serialization is now fixed as a part of #1166 and included in the release 1.2.3 . feature_importance_ is still not serialized, it is technically easy to implement but can increase a model size somewhat. But if more people want this than not we can do it.

@andrey-khropov andrey-khropov changed the title How to properly save/load model best iteration and feature importance information is lost after serialization Mar 9, 2024
@danotank
Copy link

Maybe you can make this as an option in save_model? I personally need feature_importance_ serialization really bad...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants
0