bsdcfp

bsdcfp

0 followers · 2 following

Stars

cumulo-autumn / StreamDiffusion

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

Python 10,278 782 Updated Dec 4, 2024

mit-han-lab / radial-attention

Radial Attention Official Implementation

Python 297 12 Updated Jul 6, 2025

bytedance / LatentSync

Taming Stable Diffusion for Lip Sync!

Python 4,504 716 Updated Jun 20, 2025

NVIDIA / NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

C++ 413 56 Updated Jul 2, 2025

obsidian-tasks-group / obsidian-tasks

Task management for the Obsidian knowledge base.

TypeScript 2,985 283 Updated Jul 7, 2025

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,655 381 Updated Apr 1, 2025

tianweiy / CausVid

(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Python 741 33 Updated May 17, 2025

Tencent-Hunyuan / HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Jupyter Notebook 4,188 350 Updated Jan 13, 2025

ankandrew / fast-alpr

Fast Automatic License Plate Recognition (ALPR) framework.

Python 154 41 Updated Jul 1, 2025

we0091234 / Chinese_license_plate_detection_recognition

yolov5 车牌检测车牌识别中文车牌识别检测支持12种中文车牌支持双层车牌

Python 1,617 267 Updated Nov 25, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,500 160 Updated Jul 7, 2025

mirage-project / mirage

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 1,533 92 Updated Jul 7, 2025

unslothai / unsloth

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 41,650 3,322 Updated Jul 7, 2025

ELS-RD / kernl

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook 1,575 98 Updated Feb 16, 2024

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 16,067 2,098 Updated Jul 8, 2025

triton-inference-server / vllm_backend

Python 271 31 Updated Jun 10, 2025

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,531 447 Updated Jul 8, 2025