8000 xlite-dev · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
@xlite-dev

xlite-dev

Develop ML/AI toolkits and ML/AI/CUDA Learning resources.

Pinned Loading

  1. lite.ai.toolkit lite.ai.toolkit Public

    🛠 A lite C++ AI toolkit: 100+🎉 models (Stable-Diffusion, FaceFusion, YOLO series, Det, Seg, Matting) with MNN, ORT and TensorRT.

    C++ 4.1k 739

  2. LeetCUDA LeetCUDA Public

    📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

    Cuda 4.4k 459

  3. Awesome-LLM-Inference Awesome-LLM-Inference Public

    📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, Parallelism, MLA, etc.

    Python 4k 277

  4. lihang-notes lihang-notes Public

    📚《统计学习方法-李航: 笔记-从原理到实现》 这是一份非常详细的学习笔记,200页PDF,各种手推公式细节讲解以及R语言实现. 🎉

    Shell 458 55

  5. torchlm torchlm Public

    💎A high level pipeline for face landmarks detection: train, eval, inference (Python/C++) and 100+ data augmentations.

    Python 258 24 8000

  6. ffpa-attn ffpa-attn Public

    📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.

    Cuda 175 7

Repositories

Showing 10 of 24 repositories
  • Awesome-LLM-Inference Public

    📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, Parallelism, MLA, etc.

    xlite-dev/Awesome-LLM-Inference’s past year of commit activity
    Python 4,009 GPL-3.0 277 2 0 Updated May 18, 2025
  • Awesome-DiT-Inference Public 10000

    📚A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉

    xlite-dev/Awesome-DiT-Inference’s past year of commit activity
    235 GPL-3.0 14 0 0 Updated May 17, 2025
  • LeetCUDA Public

    📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

    xlite-dev/LeetCUDA’s past year of commit activity
    Cuda 4,353 GPL-3.0 459 3 0 Updated May 17, 2025
  • lihang-notes Public

    📚《统计学习方法-李航: 笔记-从原理到实现》 这是一份非常详细的学习笔记,200页PDF,各种手推公式细节讲解以及R语言实现. 🎉

    xlite-dev/lihang-notes’s past year of commit activity
    Shell 458 GPL-3.0 55 2 0 Updated May 17, 2025
  • .github Public
    xlite-dev/.github’s past year of commit activity
    1 0 0 0 Updated May 17, 2025
  • SageAttention Public Forked from thu-ml/SageAttention

    Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

    xlite-dev/SageAttention’s past year of commit activity
    Cuda 0 Apache-2.0 110 0 0 Updated May 13, 2025
  • SpargeAttn Public Forked from thu-ml/SpargeAttn

    SpargeAttention: A training-free sparse attention that can accelerate any model inference.

    xlite-dev/SpargeAttn’s past year of commit activity
    Cuda 6 Apache-2.0 34 0 0 Updated May 11, 2025
  • HGEMM Public

    ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

    xlite-dev/HGEMM’s past year of commit activity
    Cuda 77 GPL-3.0 3 0 0 Updated May 10, 2025
  • ffpa-attn Public

    📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.

    xlite-dev/ffpa-attn’s past year of commit activity
    Cuda 175 GPL-3.0 7 3 0 Updated May 10, 2025
  • lite.ai.toolkit Public

    🛠 A lite C++ AI toolkit: 100+🎉 models (Stable-Diffusion, FaceFusion, YOLO series, Det, Seg, Matting) with MNN, ORT and TensorRT.

    xlite-dev/lite.ai.toolkit’s past year of commit activity
    C++ 4E25 4,093 GPL-3.0 739 1 0 Updated Apr 28, 2025

Top languages

Loading…

0