Tags · damionfan/cutlass

v2.8.0

Updated GEMM performance plot with CUTLASS 2.8 compiled with CUDA 11.…

…5 Toolkit (NVIDIA#375)

Updated GEMM performance plot with CUTLASS 2.8 compiled using CUDA 11.5 Toolkit.

GPUs under test:

    NVIDIA A100
    NVIDIA A2
    NVIDIA TitanV
    NVIDIA GeForce 2080 Ti

Dec 6, 2021
5fe09c2
zip
tar.gz

v2.7.0

CUTLASS 2.7 (NVIDIA#318)

CUTLASS 2.7

Mainloop fusion for GEMM: summation over A or B
Strided DGRAD (optimized iterators)
Half-precision GELU_taylor activation functions
Use these when accumulation and epilogue compute types are all cutlass::half_t
Tuning and bug fixes to fused GEMM + GEMM example
Support for smaller than 128b aligned Convolutions: see examples
Caching of results to accelerate Convolution unit tests
Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF
Corrections and bug fixes reported by the CUTLASS community
Thank you for filing these issues!

authored-by: Haicheng Wu haichengw@nvidia.com, Manish Gupta manigupta@nvidia.com, Dustyn Blasig dblasig@nvidia.com, Andrew Kerr akerr@nvidia.com

Sep 20, 2021
2e07c4c
zip
tar.gz

v2.6.1

CUTLASS 2.6.1 - functional and performance enhancements to strided DG…

…RAD, fixes, and tuning

* cutlass 2.6 update

* remove debug prints

* cutlass 2.6.1 (minor update)

* Updated CHANGELOG.

* Minor edit to readme to indicate patch version.

* Minor edit to readme.

Co-authored-by:  Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com>

Sep 3, 2021
6c2f8f2
zip
tar.gz

v2.6.0

Merge pull request NVIDIA#308 from dongxiao92/patch-1

fix typo in doc

Aug 8, 2021
a01feb9
zip
tar.gz

v2.5.0

Create PUBLICATIONS.md (NVIDIA#189)

Mar 3, 2021
0f10563
zip
tar.gz

v2.4.0

cutlass 2.4 documentation only update

Nov 23, 2020
ccb697b
zip
tar.gz

v2.3.0

Merge pull request NVIDIA#135 from NVIDIA/cutlass_2.3_final

CUTLASS 2.3.0

Sep 25, 2020
c2b80ad
zip
tar.gz

v2.2.0

Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (N…

…VIDIA#100)

- Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>.
- Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out
- Added test_examples target to build and test all CUTLASS examples
- Minor edits to documentation to point to GTC 2020 webinar

Jun 15, 2020
1ab1027
zip
tar.gz

v2.1.0

update tools/library/CMakeLists to require python 3.6 according to NV…

…IDIA#70 (NVIDIA#82)

NVIDIA#70 only updates the documentation. This commit reflects this bump in python version to the CMake configuration as well.

Apr 8, 2020
e33d90b
zip
tar.gz

v2.0.0

Need Python 3.6 to use enum.auto() (NVIDIA#70)

Nov 22, 2019
7c0cd26
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.8.0

v2.7.0

v2.6.1

v2.6.0

v2.5.0

v2.4.0

v2.3.0

v2.2.0

v2.1.0

v2.0.0

Tags: damionfan/cutlass