8000 [Bug] host memory leak in RTX 2080Ti in multisplit · Issue #187 · cudpp/cudpp · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Bug] host memory leak in RTX 2080Ti in multisplit #187

N 8000 ew issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zhuyinheng opened this issue May 6, 2019 · 7 comments
Open

[Bug] host memory leak in RTX 2080Ti in multisplit #187

zhuyinheng opened this issue May 6, 2019 · 7 comments

Comments

@zhuyinheng
Copy link

envs 1:

  • ubuntu 16.04
  • gcc,g++: 5.4
  • nvida driver: 410
  • cuda: 9.0
  • Device: A : 2080Ti , B : 1080Ti, C: GV100 (in diff separate computer)

test result

tested by official test script: cudpp/build/bin/cudpp_hash_testrig -all
B,C device work fine, but A host memory when Performing key-value multisplit tests.

@changephilip
Copy link
changephilip commented Jun 22, 2019

#185
maybe the same problem and i met it while linking cudpp with cuda9.2 on RTX 2080ti.
it will hang after first cudaMalloc and consume all host memory .
And also hanging after first cuda-api about 2 minutes is seen in other normal program on RTX 2080ti.

@zhuyinheng
Copy link
Author

@changephilip do you find any solution?

@changephilip
Copy link
changephilip commented Aug 5, 2019

@cow8

@changephilip do you find any solution?

Try to use cub instead of cudpp for most used functions.
cudpp_hash may need extra coding with yourself.

@tigeroses
Copy link

I have the same problem of memory leak with cuda9.2 on TitanV, and i fix it by updating the src/cudpp/CMakeLists.txt and src/cudpp_hash/CMakeLists.txt : add compute capability 60 and 70

Then re-compile it with CMake, and the simpleCUDPP passed the test.

for example:

image

@tigeroses
Copy link
tigeroses commented Dec 9, 2019

I guess the picture cannot be seen, here the text result of diff:

diff --git a/src/cudpp/CMakeLists.txt b/src/cudpp/CMakeLists.txt
index f18704e..7627c4d 100644
--- a/src/cudpp/CMakeLists.txt
+++ b/src/cudpp/CMakeLists.txt
@@ -100,6 +100,8 @@ set(GENCODE_SM21 -gencode=arch=compute_20,code=sm_21 -gencode=arch=compute_20,co
 set(GENCODE_SM30 -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_30,code=compute_30)
 set(GENCODE_SM35 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_35,code=compute_35)
 set(GENCODE_SM50 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_50,code=compute_50)
+set(GENCODE_SM60 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_60,code=compute_60)
+set(GENCODE_SM70 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70)

 #set(GENCODE -gencode=arch=compute_20,code=compute_20) # at least generate PTX

@@ -125,11 +127,19 @@ option(CUDPP_GENCODE_SM30

 option(CUDPP_GENCODE_SM35
        "ON to generate code for Compute Capability 3.5 devices (e.g. Tesla K20)"
-       OFF)
+       ON)

 option(CUDPP_GENCODE_SM50
        "ON to generate code for Compute Capability 5.0 devices (e.g. GeForce GTX 750)"
-       OFF)
+       ON)
+
+option(CUDPP_GENCODE_SM60
+       "ON to generate code for Compute Capability 6.0 devices"
+       ON)
+
+option(CUDPP_GENCODE_SM70
+       "ON to generate code for Compute Capability 7.0 devices"
+       ON)

 if (CUDPP_GENCODE_SM12)
   set(GENCODE ${GENCODE} ${GENCODE_SM12})
@@ -159,6 +169,14 @@ if (CUDPP_GENCODE_SM50)
   set(GENCODE ${GENCODE} ${GENCODE_SM50})
 endif(CUDPP_GENCODE_SM50)

+if (CUDPP_GENCODE_SM60)
+  set(GENCODE ${GENCODE} ${GENCODE_SM60})
+endif(CUDPP_GENCODE_SM60)
+
+if (CUDPP_GENCODE_SM70)
+  set(GENCODE ${GENCODE} ${GENCODE_SM70})
+endif(CUDPP_GENCODE_SM70)
+
 if (CUDA_VERBOSE_PTXAS)
   set(VERBOSE_PTXAS --ptxas-options=-v)
 endif (CUDA_VERBOSE_PTXAS)

@wing435
Copy link
wing435 commented Dec 19, 2019

@tigeroses hi , I have the same problem. I try your solution,but it does not work, could you tell me the drive version,cuda version ande GPU version?
win7 64bit
cuda :10.0
drive:418.91
gpu:2080Ti
vs2013

@tigeroses
Copy link

@wing435 Here is my env:
win7 64bit
cuda :9.2
drive:398.75
gpu:Titan V * 2
vs2015

This week I participated in the 2019 GTC and asked NVIDIA's development experts at the scene about how to use hash table in CUDA, he told me that their team recently developed a library called HUGE-CTR, which implements a hash table, and supports dynamic insertion. If you just need a hash table, you can try it. Of course, I will also study this repo.

https://github.com/NVIDIA/HugeCTR.git

GPU Hashtable makes the data preprocessing easier and enables dynamic insertion in HugeCTR 2.0. The input training data are hash values (64bit long long type) instead of original indices. Thus embedding initialization is not required before training and if you start a training from scratch, only an initialized dense model is needed (using –model-init). A pair of <key,value> (random small weight) will be inserted during runtime only when a new key appears in the training data and hashtable cannot find it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0