8000 demonbibi / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View demonbibi's full-sized avatar

Block or report demonbibi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,344 593 Updated May 16, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,662 769 Updated May 12, 2025

Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152). 10000

Python 1,060 201 Updated May 15, 2025

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 32,425 13,469 Updated May 17, 2025

compiler learning resources collect.

Python 2,384 347 Updated Mar 19, 2025

Mamba SSM architecture

Python 14,878 1,302 Updated May 9, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 14,428 2,870 Updated May 17, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 4,244 453 Updated May 12, 2025

A TensorFlow Extension: GPU performance tools for TensorFlow.

Python 26 7 Updated Jul 27, 2023

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 865 164 Updated Dec 30, 2024

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 3,172 553 Updated May 17, 2025

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 496 42 Updated Apr 21, 2025

NVIDIA Linux open GPU kernel module source

C 15,781 1,400 Updated May 12, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,413 422 Updated May 17, 2025

Making large AI models cheaper, faster and more accessible

Python 40,881 4,508 Updated May 16, 2025

System for AI Education Resource.

Python 3,993 507 Updated Oct 25, 2024

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 13,569 1,944 Updated May 7, 2025

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Python 2,904 738 Updated Apr 30, 2025

MetaBalance algorithm for multi-task learning

Python 58 5 Updated Feb 9, 2022

Official PyTorch Implementation for Conflict-Averse Gradient Descent (CAGrad)

Python 122 17 Updated Nov 9, 2023

LLM inference in C/C++

C++ 80,375 11,791 Updated May 16, 2025

how to learn PyTorch and OneFlow

427 27 Updated Mar 22, 2024

Reference implementation for DPO (Direct Preference Optimization)

Python 2,568 212 Updated Aug 11, 2024

DLRover: An Automatic Distributed Deep Learning System

Python 1,447 178 Updated May 16, 2025

Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.

Python 503 145 Updated Apr 18, 2025

A permissively licensed C and C++ Task Scheduler for creating parallel programs. Requires C++11 support.

C++ 1,834 154 Updated Jan 27, 2025

A list of awesome papers and resources of recommender system on large language model (LLM).

1,857 140 Updated Mar 17, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,479 671 Updated May 14, 2025
C++ 4,697 509 Updated May 15, 2025
Next
0