8000 Memory leak in high order derivative? (pytorch 0.2) · Issue #2498 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Memory leak in high order derivative? (pytorch 0.2) 8000 #2498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xuanqing94 opened this issue Aug 21, 2017 · 8 comments
Closed

Memory leak in high order derivative? (pytorch 0.2) #2498

xuanqing94 opened this issue Aug 21, 2017 · 8 comments

Comments

@xuanqing94
Copy link
xuanqing94 commented Aug 21, 2017

Not sure if this implementation is efficient, but when calculating Hessian-vector product in a loop, it terminates because of out of memory after ~30 iterations (Titan X), here is my implementation:

     net = VGG() # could be other model
     loss_f = nn.NLLLoss()
     net.cuda()
     loss_f.cuda()
     data = dst.CIFAR10("~/cifar10-py", download=True, train=True, transform=tfs.Compose(
                                                                   [ tfs.ToTensor(),
                                                                     tfs.Normalize((.5, .5, .5), (.5, .5, .5)) ]))
     dataloader = DataLoader(data, batch_size=opt.batchSize, shuffle=True, num_workers=2)
     v = []
     for i, p in enumerate(net.parameters()):
         v.append(Variable(torch.FloatTensor(p.data.size()).normal_(0, 1).cuda(), requires_grad=False))
     for img, label in dataloader:
         img = img.cuda()
         label = label.cuda()
         input = Variable(img, requires_grad=True)
         target = Variable(label)
         output = net(input)
         loss = loss_f(output, target)
         grad_params = torch.autograd.grad(loss, net.parameters(), create_graph=True)
         inner_prod = 0.0
         for vi, grad in zip(v, grad_params):
             inner_prod += torch.sum(grad * vi)
         Hv = torch.autograd.grad(inner_prod, net.parameters(), create_graph=True)

Basically, it first calculates loss, then gradient of weight. Do an inner product with v, then takes gradient of weight again. Similar code works good on Theano.

Here is the traceback:

THCudaCheck FAIL file=xxx/pytorch-0.2.0/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "./main.py", line 50, in
Hv = torch.autograd.grad(inner_prod, grad_params, create_graph=True)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/init.py", line 153, in grad
inputs, only_inputs)
File "/usr/local/lib/python2.7/dist-packages/torch/nn/functions/thnn/batchnorm_double_backwards.py", line 70, in batchnorm_double_backwards_fn
gI_2t = (gOinmu_sum * sigma2_eps_neg_3_2).div
(M) * (ggI_sum.div(M) - ggI)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 820, in sub
return self.sub(other)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 332, in sub
return self._sub(other, False)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/variable.py", line 326, in _sub
return Sub.apply(self, other, inplace)
File "/usr/local/lib/python2.7/dist-packages/torch/autograd/_functions/basic_ops.py", line 34, in forward
return a.sub(b)
RuntimeError: cuda runtime error (2) : out of memory at xxx/pytorch-0.2.0/torch/lib/THC/generic/THCStorage.cu:66

@xuanqing94
Copy link
Author

I've found a similar topic in #2287 . Which should be fixed in #2328 and #2326
But I think this issue is different, if I comment out
Hv = torch.autograd.grad(inner_prod, net.parameters(), create_graph=True)
then it won't leak.

@gchanan
Copy link
Contributor
gchanan commented Aug 21, 2017

Can you try building trunk and running your test to see if it still leaks?

@xuanqing94
Copy link
Author

@gchanan As far as I know, this issue is related with BatchNorm layer. If I remove all those layers then memory usage is constant.

@gchanan
Copy link
Contributor
gchanan commented Aug 26, 2017

@LIU-Xuanqing did you try installing from source as I mentioned above?

@xuanqing94
Copy link
Author

@gchanan Aha... I used the wrong source from: https://github.com/pytorch/pytorch/releases/tag/v0.2.0

It works using master branch, thanks!

@chao1224
Copy link

Hi Xuanqing, @xuanqing94

I'm also working on second-order derivative. Can you help with two questions here?

  1. It seems like the hessian output has the same dimension as weight parameters right? Since all of the gradients are accumulated w.r.t. the target tensor.
  2. When calculating the inner product, inner_prod += torch.sum(grad * vi) , the vi is under the normal distribution, curious why the normal distribution? I was wondering if we can try inner_prod += torch.sum(grad)

Thank you in advance.

@xuanqing94
Copy link
Author

@chao1224 Hi,

  1. Hessian has dimension dxd where d is dimension of weight, I'm calculating Hessian-vector product, which has d.
  2. vi can be any d-dim vector, I use normal distribution just for testing. if you use torch.sum(grad) then it is a special case: vi=torch.ones(d)

@chao1224
Copy link

Thanks for answering @xuanqing94
And it would make much more sense if we set vi = \epsilon right?

samnordmann pushed a commit to samnordmann/pytorch that referenced this issue Mar 6, 2023
Quick fix on dynamo import in python tests. Patches dynamo test failure with upstream's updated API
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
0