-
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLMTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
C++ Apache License 2.0 UpdatedJul 9, 2025 -
-
-
AITemplate Public
Forked from facebookincubator/AITemplateAITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Python Apache License 2.0 UpdatedOct 4, 2022 -
Tensile Public
Forked from ROCm/TensileStretching GPU performance for GEMMs and tensor contractions.
Python MIT License UpdatedJun 22, 2021 -
HIPIFY Public
Forked from ROCm/HIPIFYHIPIFY: Convert CUDA to Portable C++ Code
C++ UpdatedJun 15, 2021 -
-
rocFFT Public
Forked from ROCm/rocFFTNext generation FFT implementation for ROCm
C++ MIT License UpdatedDec 9, 2020 -
rocBLAS Public
Forked from ROCm/rocBLASNext generation BLAS implementation for ROCm platform
Shell MIT License UpdatedMay 5, 2020 -
HIP-Performance-Optmization-on-VEGA64 Public
Forked from fsword73/HIP-Performance-Optmization-on-VEGA6414 basic topics for VEGA64 performance optmization
C++ UpdatedNov 8, 2019 -
SGEMM_on_VEGA Public
Forked from fsword73/SGEMM_on_VEGAAn alternative SGEMM implementation on AMD Vega Series
Assembly UpdatedOct 16, 2019 -
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ BSD 3-Clause "New" or "Revised" License UpdatedJul 10, 2019 -
bug_opencl_boost_compute Public
Minimal example for reproducing segfault issue with Boost.Compute
CMake UpdatedMay 29, 2019 -
compute Public
Forked from boostorg/computeA C++ GPU Computing Library for OpenCL
-