8000 A way to save and load the output of upstream model for speedup · Issue #546 · s3prl/s3prl · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

A way to save and load the output of upstream model for speedup #546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
shreyas2206 opened this issue Jun 19, 2024 · 3 comments
Open

A way to save and load the output of upstream model for speedup #546

shreyas2206 opened this issue Jun 19, 2024 · 3 comments

Comments

@shreyas2206
Copy link

Feature request

Is your feature request related to a problem? Please describe.
I am using this toolkit to run several experiments where the upstream is frozen, by modifying the downstream each time. I find that the training is quite slow for some upstream models. In terms of computational complexity, the downstream models are significantly lighter than the upstream. If there is a way to save and load the upstream outputs for each file, it will make the training significantly faster.

Describe the solution you'd like
Adding this feature is usually straightforward if there is only one kind of feature to store (For example, the last hidden layer). We just need to store the output of the featurizer in the first epoch to a cache path for each audio file in the dataset, and load them in the subsequent epochs, without having to compute them on the fly. But when we want the featurizer to be trainable (when we are using the weighted sum approach), we will need to store all the hidden layers of the upstream, and handling all that is quite complicated, because the output of upstream is a dictionary, whose size varies with batch size, and entries vary with the upstream model. This can be handled in the data loader itself, or separately inside the training loop.

I honestly feel it is worth spending some time adding this feature, as it will save a lot of compute resources and time.

@leo19941227
Copy link
Member

Hi @shreyas2206 ,

Sure! I totally agree with you. This would be an extremely useful feature. However, I might not have enough time to add this function and test the backward compatibility for all the tasks. Any volunteers interested in implementing this function are highly welcome. We care about acknowledging contributions and will give credit for major features in the main README log.

Many thanks for the proposal!

Sincerely,
Leo

@shreyas2206
Copy link
Author

Thank you for the response, Leo!

I will shortly have an intern who will be working on this. We will evaluate the feasibility of implementing this feature in terms of storage requirements, look at suitable data formats, and try to come up with a solution. I will keep you updated, and reach out to you if we need any clarifications.

Regards,
Shrey 860C as

@leo19941227
Copy link
Member
leo19941227 commented Sep 2, 2024

Hi @shreyas2206,

Cool! This sounds super interesting.
I believe this requires a substantial amount of SSD storage to load all the hidden states efficiently, but I think that the SSD requirement may not be a significant bottleneck nowadays (it was back in 2021 in our lab).
I would be happy to help implement this feature.
If you encounter any problem, please don't hesitate to post here so we can discuss.
Thanks!

Sincerely,
Leo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants
0