Fix OneCycle step length when in multiprocess #385

muellerzr · 2022-05-23T15:35:06Z

Fix improper number of times OneCycle scheduler is called

What does this add?

This PR fixes an issue where torch.optim.lr_scheduler.OneCycle.step expects to be called a particular number of times, and will raise an issue when called more than this. The typical scenario of when this happens is when the user did not specify drop_last=True in their DataLoaders. For now this is specific to OneCycle, but the API will ideally be consistent if another scheduler is added to Pytorch that uses this same method of tracking maximum steps.

Note: All other schedulers work, it's just OneCycle that does not

Why is it needed?

The cv_examples currently all break in multiproc settings, due to the incorrect number of times .step() is being called

What parts of the API does this impact?

User-facing:

Nothing

Internal structure:

Adds the following check if not self.split_batches:

for _ in range(num_processes):
+    if getattr(self.scheduler, "total_steps", 0) < self.scheduler.last_epoch:
        self.scheduler.step(*args, **kwargs)

HuggingFaceDocBuilderDev · 2022-05-23T15:57:36Z

The documentation is not available anymore as the PR was closed or merged.

src/accelerate/scheduler.py

Special onecycle fix

db3367f

muellerzr added bug Something isn't working enhancement New feature or request labels May 23, 2022

muellerzr requested a review from sgugger May 23, 2022 15:35

muellerzr marked this pull request as draft May 23, 2022 15:41

muellerzr added 2 commits May 23, 2022 11:43

LT or equal to

6f51087

Style

cf13000

muellerzr marked this pull request as ready for review May 23, 2022 15:54

sgugger reviewed May 23, 2022

View reviewed changes

src/accelerate/scheduler.py Show resolved Hide resolved

muellerzr merged commit f9de557 into main May 23, 2022

muellerzr deleted the onecycle-fix branch May 23, 2022 16:28

miccio-dk mentioned this pull request Sep 9, 2022

underlying lr_scheduler.step() never called with OneCycleLR in single GPU #690

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix OneCycle step length when in multiprocess #385

Fix OneCycle step length when in multiprocess #385

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix OneCycle step length when in multiprocess #385

Fix OneCycle step length when in multiprocess #385

Uh oh!

Conversation

Uh oh!

Fix improper number of times OneCycle scheduler is called

What does this add?

Why is it needed?

What parts of the API does this impact?

User-facing:

Internal structure:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!