[Bug] host memory leak in RTX 2080Ti in multisplit #187

zhuyinheng · 2019-05-06T05:59:27Z

envs 1:

ubuntu 16.04
gcc,g++: 5.4
nvida driver: 410
cuda: 9.0
Device: A : 2080Ti , B : 1080Ti, C: GV100 (in diff separate computer)

test result

tested by official test script: cudpp/build/bin/cudpp_hash_testrig -all
B,C device work fine, but A host memory when Performing key-value multisplit tests.

The text was updated successfully, but these errors were encountered:

changephilip · 2019-06-22T06:11:59Z

#185
maybe the same problem and i met it while linking cudpp with cuda9.2 on RTX 2080ti.
it will hang after first cudaMalloc and consume all host memory .
And also hanging after first cuda-api about 2 minutes is seen in other normal program on RTX 2080ti.

zhuyinheng · 2019-08-05T11:25:22Z

@changephilip do you find any solution?

changephilip · 2019-08-05T11:31:29Z

@cow8

@changephilip do you find any solution?

Try to use cub instead of cudpp for most used functions.
cudpp_hash may need extra coding with yourself.

tigeroses · 2019-12-09T09:38:17Z

I have the same problem of memory leak with cuda9.2 on TitanV, and i fix it by updating the src/cudpp/CMakeLists.txt and src/cudpp_hash/CMakeLists.txt : add compute capability 60 and 70

Then re-compile it with CMake, and the simpleCUDPP passed the test.

for example:

tigeroses · 2019-12-09T09:40:20Z

I guess the picture cannot be seen, here the text result of diff:

diff --git a/src/cudpp/CMakeLists.txt b/src/cudpp/CMakeLists.txt
index f18704e..7627c4d 100644
--- a/src/cudpp/CMakeLists.txt
+++ b/src/cudpp/CMakeLists.txt
@@ -100,6 +100,8 @@ set(GENCODE_SM21 -gencode=arch=compute_20,code=sm_21 -gencode=arch=compute_20,co
 set(GENCODE_SM30 -gencode=arch=compute_30,code=sm_30 -gencode=arch=compute_30,code=compute_30)
 set(GENCODE_SM35 -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_35,code=compute_35)
 set(GENCODE_SM50 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_50,code=compute_50)
+set(GENCODE_SM60 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_60,code=compute_60)
+set(GENCODE_SM70 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_70,code=compute_70)

 #set(GENCODE -gencode=arch=compute_20,code=compute_20) # at least generate PTX

@@ -125,11 +127,19 @@ option(CUDPP_GENCODE_SM30

 option(CUDPP_GENCODE_SM35
        "ON to generate code for Compute Capability 3.5 devices (e.g. Tesla K20)"
-       OFF)
+       ON)

 option(CUDPP_GENCODE_SM50
        "ON to generate code for Compute Capability 5.0 devices (e.g. GeForce GTX 750)"
-       OFF)
+       ON)
+
+option(CUDPP_GENCODE_SM60
+       "ON to generate code for Compute Capability 6.0 devices"
+       ON)
+
+option(CUDPP_GENCODE_SM70
+       "ON to generate code for Compute Capability 7.0 devices"
+       ON)

 if (CUDPP_GENCODE_SM12)
   set(GENCODE ${GENCODE} ${GENCODE_SM12})
@@ -159,6 +169,14 @@ if (CUDPP_GENCODE_SM50)
   set(GENCODE ${GENCODE} ${GENCODE_SM50})
 endif(CUDPP_GENCODE_SM50)

+if (CUDPP_GENCODE_SM60)
+  set(GENCODE ${GENCODE} ${GENCODE_SM60})
+endif(CUDPP_GENCODE_SM60)
+
+if (CUDPP_GENCODE_SM70)
+  set(GENCODE ${GENCODE} ${GENCODE_SM70})
+endif(CUDPP_GENCODE_SM70)
+
 if (CUDA_VERBOSE_PTXAS)
   set(VERBOSE_PTXAS --ptxas-options=-v)
 endif (CUDA_VERBOSE_PTXAS)

wing435 · 2019-12-19T02:10:17Z

@tigeroses hi , I have the same problem. I try your solution,but it does not work, could you tell me the drive version,cuda version ande GPU version?
win7 64bit
cuda :10.0
drive:418.91
gpu:2080Ti
vs2013

tigeroses · 2019-12-20T01:31:25Z

@wing435 Here is my env:
win7 64bit
cuda :9.2
drive:398.75
gpu:Titan V * 2
vs2015

This week I participated in the 2019 GTC and asked NVIDIA's development experts at the scene about how to use hash table in CUDA, he told me that their team recently developed a library called HUGE-CTR, which implements a hash table, and supports dynamic insertion. If you just need a hash table, you can try it. Of course, I will also study this repo.

https://github.com/NVIDIA/HugeCTR.git

GPU Hashtable makes the data preprocessing easier and enables dynamic insertion in HugeCTR 2.0. The input training data are hash values (64bit long long type) instead of original indices. Thus embedding initialization is not required before training and if you start a training from scratch, only an initialized dense model is needed (using –model-init). A pair of <key,value> (random small weight) will be inserted during runtime only when a new key appears in the training data and hashtable cannot find it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] host memory leak in RTX 2080Ti in multisplit #187

[Bug] host memory leak in RTX 2080Ti in multisplit #187

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Bug] host memory leak in RTX 2080Ti in multisplit #187

[Bug] host memory leak in RTX 2080Ti in multisplit #187

Comments

envs 1:

test result

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!