Stars
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
A high-throughput and memory-efficient inference and serving engine for LLMs
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.🎉
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
A toolbox for deep learning model deployment using C++ YoloX | YoloV7 | YoloV8 | Gan | OCR | MobileVit | Scrfd | MobileSAM | StableDiffusion
Accelerate your Stable Diffusion inference with the library's universal C/C++ framework design, powered by ONNXRuntime & across platforms.
PPL Quantization Tool (PPQ) is a powerful offline neura EECB l network quantization tool.
rajeevsrao / TensorRT
Forked from NVIDIA/TensorRTTensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
Fast and memory-efficient exact attention
Transformer related optimization, including BERT, GPT
Large Language Model Deployment Toolkit
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
The Triton backend for TensorRT.
OpenVINO backend for Triton.
Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Release for Improved Denoising Diffusion Probabilistic Models
OneDiff: An out-of-the-box acceleration library for diffusion models.
Official PyTorch & Diffusers implementation of "Text-Guided Texturing by Synchronized Multi-View Diffusion"
Simple samples for TensorRT programming
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
DeepEP: an efficient expert-parallel communication library
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
🚀 Easier & Faster YOLO Deployment Toolkit for NVIDIA 🛠️
Cross-platform, customizable multimedia/video processing framework. With strong GPU acceleration, heterogeneous design, multi-language support, easy to use, multi-framework compatible and high perf…
Optimizing Mobile Deep Learning on ARM GPU with TVM
An Open Source Machine Learning Framework for Everyone