8000 dgqsyujian / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View dgqsyujian's full-sized avatar

Block or report dgqsyujian

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。

C++ 376 99 Updated Jun 22, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 50,899 8,374 Updated Jun 27, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.🎉

Cuda 5,012 541 Updated Jun 27, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,481 630 Updated Jun 23, 2025

A toolbox for deep learning model deployment using C++ YoloX | YoloV7 | YoloV8 | Gan | OCR | MobileVit | Scrfd | MobileSAM | StableDiffusion

C 540 28 Updated Feb 6, 2025

Accelerate your Stable Diffusion inference with the library's universal C/C++ framework design, powered by ONNXRuntime & across platforms.

C++ 446 71 Updated Aug 16, 2024

PPL Quantization Tool (PPQ) is a powerful offline neura EECB l network quantization tool.

Python 1,699 257 Updated Mar 28, 2024

TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.

C++ 19 8 Updated Mar 7, 2024

Fast and memory-efficient exact attention

Python 18,051 1,772 Updated Jun 25, 2025

Transformer related optimization, including BERT, GPT

C++ 6,221 910 Updated Mar 27, 2024

Acuity Model Zoo

JavaScript 145 23 Updated Nov 3, 2022

Large Language Model Deployment Toolkit

Cuda 3,220 485 Updated Jun 16, 2025

校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step

C++ 2,996 333 Updated Jun 22, 2025

The Triton backend for TensorRT.

C++ 77 32 Updated Jun 27, 2025

OpenVINO backend for Triton.

C++ 32 17 Updated Jun 18, 2025

Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch

Python 377 22 Updated Jun 27, 2025

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

Python 146,143 29,471 Updated Jun 27, 2025

Release for Improved Denoising Diffusion Probabilistic Models

Python 3,603 520 Updated Jul 18, 2024

OneDiff: An out-of-the-box acceleration library for diffusion models.

Jupyter Notebook 1,905 124 Updated May 8, 2025

Official PyTorch & Diffusers implementation of "Text-Guided Texturing by Synchronized Multi-View Diffusion"

Python 169 12 Updated Mar 18, 2025

Simple samples for TensorRT programming

Python 1,618 350 Updated May 27, 2025

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

Python 29,519 6,067 Updated Jun 27, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,218 822 Updated Jun 27, 2025

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 11,770 2,212 Updated Jun 26, 2025

🚀 Easier & Faster YOLO Deployment Toolkit for NVIDIA 🛠️

C++ 1,290 141 Updated May 30, 2025

Cross-platform, customizable multimedia/video processing framework. With strong GPU acceleration, heterogeneous design, multi-language support, easy to use, multi-framework compatible and high perf…

C++ 942 103 Updated Jun 27, 2025

Optimizing Mobile Deep Learning on ARM GPU with TVM

C 181 27 Updated Oct 15, 2018

An Open Source Machine Learning Framework for Everyone

C++ 190,502 74,718 Updated Jun 28, 2025
0