8000 Support exponential moving average in the default trainer. by yizhongw · Pull Request #2406 · allenai/allennlp · GitHub

More Web Proxy on the site http://driver.im/

This repository was archived by the owner on Dec 16, 2022. It is now read-only.

Support exponential moving average in the default trainer. #2406

Merged

joelgrus merged 11 commits into allenai:master from yizhongw:ema-trainer

Jan 24, 2019

Contributor

yizhongw commented

No description provided.

yizhongw added 4 commits

January 20, 2019 19:40


          Support exponential moving average in the default trainer.

fb93f62


          Fixes pylint and type checker errors.

721a178


          Add doc string for the ExponentialMovingAverage module

a33d0c6


          Add doc string for the ExponentialMovingAverage module

5c728a7

Contributor Author

yizhongw commented

@matt-gardner Could you give me any hints on why the checks failed? The documentation looks good to me, but it still reported undocumented module: allennlp.training.exponential_moving_average.

joelgrus reviewed

View reviewed changes

allennlp/training/exponential_moving_average.py Outdated

+                  Parameters
+                  ----------
+                  model: ``Model``, required.
+                      An AllenNLP model to be optimized.

Contributor

joelgrus

I am not convinced about this API. Why not just pass in the named parameters in the constructor (instead of the model) and then you can get rid of that option in the method calls.

Contributor Author

yizhongw

I was trying to be consistent with only using EMA for part of the parameters. But since this might never be used, I will change to what you suggested.

allennlp/training/exponential_moving_average.py Outdated

+                          self._average_values[name] = param.data.clone()
+                          self._backup_values[name] = param.data.clone()
+                  def apply(self, num_updates: int = None, named_parameters: Iterable = None) -> None:

Contributor

joelgrus

I know this is what Tensorflow calls it, but I can't say that I like the name apply here. If it were me I would call it something like update_averages or something

Contributor Author

yizhongw

Ok, will do.

allennlp/training/exponential_moving_average.py Outdated

+                      The optional `num_updates` parameter allows one to tweak the decay rate
+                      dynamically. If passed, the actual decay rate used is:
+                          `min(decay, (1 + num_updates) / (10 + num_updates))`

Contributor

joelgrus

where does the second equation here come from?

Contributor Author

yizhongw

You can refer to this tensorflow implementation.

allennlp/training/exponential_moving_average.py Outdated

+                          named_parameters = self._model.named_parameters()
+                      for name, param in named_parameters:
+                          new_average_value = (1.0 - decay) * param.data + decay * self._average_values[name]
+                          self._average_values[name] = new_average_value.clone()

Contributor

joelgrus

why do you need clone here? I would have written something like

self._average_values[name] = (1.0 - decay) * param.data.detach() + decay * self._average_values[name]

is there a reason why two steps is better ?

Contributor Author

yizhongw

Yes, this is not necessary. I will change that.

allennlp/training/exponential_moving_average.py Outdated

+                      if named_parameters is None:
+                          named_parameters = self._model.named_parameters()
+                      for name, param in named_parameters:
+                          self._backup_values[name] = param.data.clone()

Contributor

joelgrus

why do you need clone here?

Contributor Author

yizhongw

clone() might be necessary here, to make sure the update of parameters will not affect the correspondings in _backup_values?

allennlp/training/exponential_moving_average.py Outdated

+                      if named_parameters is None:
+                          named_parameters = self._model.named_parameters()
+                      for name, param in named_parameters:
+                          param.data = self._backup_values[name].clone()

Contributor

joelgrus

and here?

Contributor Author

yizhongw

Same as above. Using clone() is safer?

allennlp/training/exponential_moving_average.py Outdated

		from allennlp.models.model import Model


		class ExponentialMovingAverage:

Contributor

joelgrus

are there other kinds of moving averages we might care about?

should there be a MovingAverage base class and then the trainer can be agnostic to what type of moving average it's using, rather than hardcoding this one?

Contributor Author

yizhongw

Tensorflow as a moving_averages.py file. But, it seems that they only implemented the exponential moving average.

Contributor

matt-gardner commented

@yizhongw, to answer your question about the docs: you need to add a reference to the module in our API docs, here. Ping me on slack if you need help figuring out how this works.

Contributor

joelgrus commented

@yizhongw do you want me to take a crack at the abstract MovingAverage version?

Contributor Author

yizhongw commented

@joelgrus Yes, sure! Then, I could modify based on that.

joelgrus added 2 commits

January 22, 2019 14:52


          add moving averages

9c13c51


          add docs

63e5878

Contributor

joelgrus commented

@yizhongw ok, take a look at what I pushed up, I'm not 100% sure I got all the clone()s and datas right, but the tests pass.

Contributor Author

yizhongw commented

@joelgrus Awesome! Thanks for implementing all those things I need to do. I checked the clone()s and all of them look good. I will run my full model with this trainer, to make sure everything works. But I think this code could already be checked in, after fixing the pylint and doc errors.

yizhongw force-pushed the ema-trainer branch 3 times, most recently from 37f4067 to 54c74e5 Compare

January 23, 2019 00:49


          Fixes pylint and doc.

a724897

yizhongw force-pushed the ema-trainer branch from 54c74e5 to a724897 Compare

January 23, 2019 01:33

9E81

Contributor

joelgrus commented

since I now wrote a lot of this code, probably someone else should review it

joelgrus requested a review from DeNeutoy

January 24, 2019 18:33


          Merge branch 'master' into ema-trainer

cd67e85

DeNeutoy approved these changes

View reviewed changes

Contributor

DeNeutoy left a comment

LGTM, modulo inplace copy

allennlp/training/moving_average.py Outdated

+                          decay = self._decay
+                      for name, parameter in self._parameters:
+                          self._shadows[name] = decay * self._shadows[name] + (1.0 - decay) * parameter.clone()

Contributor

DeNeutoy

You can do this inplace:

self._shadows[name].mul_(decay).add_((1 - decay) * parameter.data)

This is actually important I think, because copying large parameters every step is expensive. E.g all the optimizers use inplace operations.

allennlp/training/trainer.py Outdated

                           The epoch of training.  If the checkpoint is saved in the middle
                           of an epoch, the parameter is a string with the epoch and timestamp.
                       """
+                      # If moving averagew are used for parameters, we save

Contributor

DeNeutoy

averages

Contributor

DeNeutoy commented

Oh, another idea:

it's annoying to have to keep these EMAs on the same device as the parameters, because you're essentially tripling the memory requirement of the model. What if you added a accumulate_on_cpu option, where you computed the update equation on the GPU, but then did self._shadow[name].copy_(update_value, non_blocking=True) - this way, we wouldn't have to accumulate on the GPU, but I (don't think) we would have to wait for the GPU -> CPU copies which this method would require, because they can be non-blocking. What do you think?

Contributor

joelgrus commented

sounds like a good idea, let me make an attempt at it

joelgrus added 3 commits

January 24, 2019 13:30


          address pr feedback

8b61537


          Merge branch 'master' into ema-trainer

613d781


          Merge branch 'ema-trainer' of https://github.com/yizhongw/allennlp in…

03659da

…to ema-trainer

joelgrus merged commit 4465a6e into allenai:master

yizhongw deleted the ema-trainer branch

January 24, 2019 22:18

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

0