8000 GitHub - Ratbuyer/ptx-demo: cuda codes that demonstrate how to use mma and mma.sp ptx instructions
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Ratbuyer/ptx-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

ptx-demo

CUDA codes that demonstrate how to use mma and mma.sp ptx instructions

requires nvidia gpu compute capability 7.5 + and compatible CUDA toolkit installed

compile the codes using

nvcc -arch=sm_{your sm version} -O3 mma.cu -lcudart -lcuda -o demo

and

nvcc -arch=sm_{your sm version} -O3 mma.sp.cu -lcudart -lcuda -o demo

for example with RTX3060 that has compute capability of 8.6, use

nvcc -arch=sm_86 -O3 mma.sp.cu -lcudart -lcuda -o demo

run the binary with

./demo M N K ITERATIONS

where M, N, K are the matrix dimensions and ITERATIONS is the number of matrix multiplications you want to repeat for profiling

for example

./demo 1024 1024 1024 50

About

cuda codes that demonstrate how to use mma and mma.sp ptx instructions

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0