8000 [Ray Train] - Add Options to Save Last checkpoint in Ray Train Checkpointing Config · Issue #40503 · ray-project/ray · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
[Ray Train] - Add Options to Save Last checkpoint in Ray Train Checkpointing Config  #40503
Open
@kamal-rahimi

Description

@kamal-rahimi

Description

The checkpoining in Ray Train (CheckpointConfig) currently has the following options:

num_to_keep
checkpoint_score_attribute
checkpoint_score_order
checkpoint_frequency
checkpoint_at_end

It will be highly useful to add an option to keep the last_checkpoint in addition to num_to_keep.

Use case

In many scenarios, it is desired to keep the checkpoints with best metric. However, when training is interrupted (such as when there is only one worker spot instance and it gets terminated), it is required to restore from the latest checkpoint not the best one that is saved.

Metadata

Metadata

Assignees

Labels

P3Issue moderate in impact or severityenhancementRequest for new feature and/or capabilitypending-cleanupThis issue is pending cleanup. It will be removed in 2 weeks after being assigned.trainRay Train Related Issue

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    0