8000 GitHub - chuqingG/hybrid_test
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000

chuqingG/hybrid_test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hybrid Test

  • Test the performance of calling cuda from python.
  • Test the usage of TVM's c++ interface.

Usage

Run source setup.sh to build the lib.

Result

Scatter Op

depth = 2, seqlen = 3

(batch size, hidden size) (64, 64) (128, 128) (256, 256) (512, 512)
TVM relay(ms) 0.0164 0.0173 0.0144 0.1141
Our(ms) 0.0045 0.0029 0.0041 0.0088

经测试同一网络下一次gather和一次scatter用时较为接近。

Total

以(10, 100, 64, 64)为例,近似估算kernel开销

  • cell: 0.0127ms(mm)
    • 调用次数
      • mm :seqlen * depth
      • 或bmm : 2 (seqlen + depth) - 1 (mm与bmm各半)
  • gather/scatter: 0.0045ms
    • 理想情况下一次copy时间近似为一次scatter
    • 调用次数
      • mm :约 5 * seqlen * depth
      • 或bmm : 约 7 * (seqlen + depth)
      • 当前实现上瓶颈为copy(对应python中scan结果列表的append)的次数

在使用bmm情况下(由于暂时缺少bmm cell的测试结果)略大于2.781 + 3.465 = 6.246 ms

  • FT库中测试结果:ft = 7.962 ms, CuDNN = 14.298 ms

About

No desc 52BB ription, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0