Memory leak in high order derivative? (pytorch 0.2) 8000 #2498

xuanqing94 · 2017-08-21T08:05:07Z

Not sure if this implementation is efficient, but when calculating Hessian-vector product in a loop, it terminates because of out of memory after ~30 iterations (Titan X), here is my implementation:

     net = VGG() # could be other model
     loss_f = nn.NLLLoss()
     net.cuda()
     loss_f.cuda()
     data = dst.CIFAR10("~/cifar10-py", download=True, train=True, transform=tfs.Compose(
                                                                   [ tfs.ToTensor(),
                                                                     tfs.Normalize((.5, .5, .5), (.5, .5, .5)) ]))
     dataloader = DataLoader(data, batch_size=opt.batchSize, shuffle=True, num_workers=2)
     v = []
     for i, p in enumerate(net.parameters()):
         v.append(Variable(torch.FloatTensor(p.data.size()).normal_(0, 1).cuda(), requires_grad=False))
     for img, label in dataloader:
         img = img.cuda()
         label = label.cuda()
         input = Variable(img, requires_grad=True)
         target = Variable(label)
         output = net(input)
         loss = loss_f(output, target)
         grad_params = torch.autograd.grad(loss, net.parameters(), create_graph=True)
         inner_prod = 0.0
         for vi, grad in zip(v, grad_params):
             inner_prod += torch.sum(grad * vi)
         Hv = torch.autograd.grad(inner_prod, net.parameters(), create_graph=True)

Basically, it first calculates loss, then gradient of weight. Do an inner product with v, then takes gradient of weight again. Similar code works good on Theano.

Here is the traceback:

THCudaCheck FAIL file=xxx/pytorch-0.2.0/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "./main.py", line 50, in
Hv = torch.autograd.grad(inner_prod, grad_params, create_graph=True)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/init.py", line 153, in grad
inputs, only_inputs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/functions/thnn/batchnorm_double_backwards.py", line 70, in batchnorm_double_backwards_fn
gI_2t = (gOinmu_sum * sigma2_eps_neg_3_2).div(M) * (ggI_sum.div(M) - ggI)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 820, in sub
return self.sub(other)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 332, in sub
return self._sub(other, False)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 326, in _sub
return Sub.apply(self, other, inplace)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/basic_ops.py", line 34, in forward
return a.sub(b)
RuntimeError: cuda runtime error (2) : out of memory at xxx/pytorch-0.2.0/torch/lib/THC/generic/THCStorage.cu:66

xuanqing94 · 2017-08-21T22:36:54Z

I've found a similar topic in #2287 . Which should be fixed in #2328 and #2326
But I think this issue is different, if I comment out
Hv = torch.autograd.grad(inner_prod, net.parameters(), create_graph=True)
then it won't leak.

gchanan · 2017-08-21T22:49:27Z

Can you try building trunk and running your test to see if it still leaks?

xuanqing94 · 2017-08-26T21:02:02Z

@gchanan As far as I know, this issue is related with BatchNorm layer. If I remove all those layers then memory usage is constant.

gchanan · 2017-08-26T21:43:01Z

@LIU-Xuanqing did you try installing from source as I mentioned above?

xuanqing94 · 2017-08-26T23:05:39Z

@gchanan Aha... I used the wrong source from: https://github.com/pytorch/pytorch/releases/tag/v0.2.0

It works using master branch, thanks!

chao1224 · 2017-09-12T03:00:49Z

Hi Xuanqing, @xuanqing94

I'm also working on second-order derivative. Can you help with two questions here?

It seems like the hessian output has the same dimension as weight parameters right? Since all of the gradients are accumulated w.r.t. the target tensor.
When calculating the inner product, inner_prod += torch.sum(grad * vi) , the vi is under the normal distribution, curious why the normal distribution? I was wondering if we can try inner_prod += torch.sum(grad)

Thank you in advance.

xuanqing94 · 2017-09-17T13:04:50Z

@chao1224 Hi,

Hessian has dimension dxd where d is dimension of weight, I'm calculating Hessian-vector product, which has d.
vi can be any d-dim vector, I use normal distribution just for testing. if you use torch.sum(grad) then it is a special case: vi=torch.ones(d)

chao1224 · 2017-09-18T13:38:15Z

Thanks for answering @xuanqing94
And it would make much more sense if we set vi = \epsilon right?

Quick fix on dynamo import in python tests. Patches dynamo test failure with upstream's updated API

xuanqing94 closed this as completed Aug 25, 2017

xuanqing94 reopened this Aug 26, 2017

xuanqing94 closed this as completed Aug 26, 2017

samnordmann pushed a commit to samnordmann/pytorch that referenced this issue Mar 6, 2023

patching dynamo test (pytorch#2498)

ed6b6bd

Quick fix on dynamo import in python tests. Patches dynamo test failure with upstream's updated API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory leak in high order derivative? (pytorch 0.2) 8000 #2498

Memory leak in high order derivative? (pytorch 0.2) #2498

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Memory leak in high order derivative? (pytorch 0.2) 8000 #2498

Memory leak in high order derivative? (pytorch 0.2) #2498

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!