8000 CuDNN 7 grouped convolution by kouyoumin · Pull Request #5879 · BVLC/caffe · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

CuDNN 7 grouped convolution #5879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

CuDNN 7 grouped convolution #5879

wants to merge 4 commits into from

Conversation

kouyoumin
Copy link

Uses CuDNN 7's new grouped convolution instead of for loop.

@fengziyong
Copy link

@kouyoumin Have you test the forward time?
I have try to use cuDNNv7 before, but I found it was slower than the original implementation in group mode.

@Noiredd
Copy link
Member
Noiredd commented Nov 2, 2017

I have restarted the tests that failed prior to #5973, it passes now. However I'm not sure if Travis is even able to test cuDNN 7 right now. Could you take a look at #5972, where the dependencies script was also modified, and compare?

Also, this needs a squash before merging, but I think that for a review it's good as is.

@twmht
Copy link
Contributor
twmht commented Nov 3, 2017

not fast enough.

Anyone have tested this on MobileNet?

@Noiredd
Copy link
Member
Noiredd commented Nov 7, 2017

@kouyoumin Please note before rebasing that #5972 modified the cudnn.hpp, adding the v7 cases.

Copy link
Member
@Noiredd Noiredd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kouyoumin I made a more thorough testing of this PR: from what I see (both on some synthetic benchmark nets like a 1x256x256x256 cube convolved with 5x5 filters in a group of 256, as well as more realistic nets), this does not seem to make it faster (on my old GTX TITAN Z), but the RAM saving is quite significant (from 4% to almost 60% - the larger group the better). Please take a look at the comments I left in the code review, squash and rebase and I will merge this.

cudnn::dataType<Dtype>::one,
top_descs_[i], top_diff + top_offset_ * g,
cudnn::dataType<Dtype>::one,
bias_desc_, bias_diff + bias_offset_ * g));
#else
CUDNN_CHECK(cudnnConvolutionBackwardBias(handle_[g],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we leave CUDNN_CHECK(cudnnConvolutionBackwardBias(handle_[0*this->group_ + g], here?

case CUDNN_STATUS_RUNTIME_IN_PROGRESS:
return "CUDNN_STATUS_RUNTIME_IN_PROGRESS";
case CUDNN_STATUS_RUNTIME_FP_OVERFLOW:
return "CUDNN_STATUS_RUNTIME_FP_OVERFLOW";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this fragement is no longer necessary (since #5972), please remove this while rebasing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0