8000 Shenggan (shenggan) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View Shenggan's full-sized avatar
🎯
Flying
🎯
Flying
  • National University of Singapore
  • Singapore

Highlights

  • Pro

Organizations

@cosmo-cube

Block or report Shenggan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,762 294 Updated Mar 10, 2025

An open-source implementation of Regional Adaptive Sampling (RAS), a novel diffusion model sampling strategy that introduces regional variability in sampling steps

Python 126 5 Updated Feb 17, 2025

Enhance-A-Video: Better Generated Video for Free

Python 522 27 Updated Mar 17, 2025

A generative world for general-purpose robotics & embodied AI learning.

Python 24,994 2,222 Updated May 14, 2025

Fast low-bit matmul kernels in Triton

Python 299 23 Updated May 13, 2025

Democratizing AlphaFold3: an PyTorch reimplementation to accelerate protein structure prediction

Python 30 2 Updated Dec 16, 2024

AlphaFold 3 inference pipeline.

Python 6,460 813 Updated May 13, 2025

Official inference framework for 1-bit LLMs

C++ 19,186 1,413 Updated May 8, 2025

A flexible and efficient training framework for large-scale alignment tasks

Python 347 27 Updated May 14, 2025

A throughput-oriented high-performance serving framework for LLMs

Cuda 807 36 Updated May 10, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,028 1,386 Updated May 14, 2025

MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)

Python 50 5 Updated May 29, 2024

An official implementation of Pangu-Weather

Python 1,192 222 Updated Jan 12, 2024

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 828 56 Updated May 14, 2025

Applied AI experiments and examples for PyTorch

Python 267 27 Updated Apr 28, 2025

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 612 48 Updated May 5, 2025

The official Meta Llama 3 GitHub site

Python 28,686 3,376 Updated Jan 26, 2025

An interference-aware scheduler for fine-grained GPU sharing

Python 133 23 Updated Jan 26, 2025

USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference

Python 493 42 Updated Apr 21, 2025

Thunder gives you PyTorch models superpowers for training and inference. Unlock out-of-the-box optimizations for performance, memory and parallelism, or roll out your own.

Python 1,342 93 Updated May 14, 2025

Repository for MLCommons Chakra schema and tools

Python 96 52 Updated Mar 14, 2025

PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.

Python 138 64 Updated May 8, 2025

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 351 51 Updated May 14, 2025

We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters

Python 866 47 Updated Jan 3, 2025

VideoSys: An easy and efficient system for video generation

Python 1,963 128 Updated Mar 9, 2025

ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale

C++ 359 135 Updated May 7, 2025

The official implementation of "Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization"

Python 16 1 Updated Mar 14, 2024

ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching

Python 102 8 Updated May 23, 3D7D 2024

Lossless Training Speed Up by Unbiased Dynamic Data Pruning

Python 333 19 Updated Sep 24, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,947 551 Updated Apr 11, 2025
Next
0