shuxiaobo

Lu Junhao shuxiaobo

23 followers · 30 following

Guangzhou
23:06 (UTC +08:00)
www.shuxiaobo.com

Achievements

Starred repositories

OpenBMB / MiniCPM

MiniCPM4: Ultra-Efficient LLMs on End Devices, achieving 5+ speedup on typical end-side chips

Jupyter Notebook 7,942 491 Updated Jun 12, 2025

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 3,918 392 Updated Jun 15, 2025

facebookresearch / llm-transparency-tool

LLM Transparency Tool (LLM-TT), an open-source interactive toolkit for analyzing internal workings of Transformer-based language models. *Check out demo at* https://huggingface.co/spaces/facebook/l…

Python 822 63 Updated Dec 3, 2024

seal-rg / recurrent-pretraining

Pretraining code for a large-scale depth-recurrent language model

Python 781 65 Updated Jun 12, 2025

Doraemonzzz / xopes

Python 2 Updated May 11, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 15,142 2,007 Updated Jun 15, 2025

huggingface / optimum-quanto

A pytorch quantization backend for optimum

Python 950 73 Updated May 22, 2025

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 1,491 148 Updated Jun 15, 2025

tech-srl / RASP

An interpreter for RASP as described in the ICML 2021 paper "Thinking Like Transformers"

Python 311 30 Updated Sep 16, 2024

yuezhouhu / 2by4-pretrain

Efficient 2:4 sparse training algorithms and implementations

Python 54 Updated Dec 8, 2024

IST-DASLab / sparsegpt

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Python 806 105 Updated Aug 20, 2024

NVlabs / MaskLLM

[NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models

Python 167 12 Updated Jan 1, 2025

locuslab / wanda

A simple and effective LLM pruning approach.

Python 760 107 Updated Aug 9, 2024

aojunzz / NM-sparsity

Python 233 31 Updated Nov 9, 2022

JT-Ushio / MHA2MLA

Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs

Python 174 21 Updated Jun 14, 2025

laekov / fastmoe

A fast MoE impl for PyTorch

Python 1,744 196 Updated Feb 10, 2025

withinmiaov / A-Survey-on-Mixture-of-Experts-in-LLMs

The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".

370 20 Updated Mar 12, 2025

mdy666 / mdy_triton

Jupyter Notebook 134 14 Updated Apr 29, 2025

RUCAIBox / Slow_Thinking_with_LLMs

A series of technical report on Slow Thinking with LLM

Python 695 39 Updated Jun 9, 2025

fla-org / native-sparse-attention

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 696 31 Updated Mar 19, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,804 297 Updated Mar 10, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,444 617 Updated Jun 11, 2025

XunhaoLai / native-sparse-attention-triton

Efficient triton implementation of Native Sparse Attention.

Python 166 8 Updated May 23, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,818 277 Updated May 15, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,769 791 Updated Jun 13, 2025

microsoft / mup

maximal update parametrization (µP)

Jupyter Notebook 1,541 102 Updated Jul 17, 2024

microsoft / mutransformers

some common Huggingface transformers in maximal update parametrization (µP)

Jupyter Notebook 80 11 Updated Mar 14, 2022

microsoft / torchscale

Foundation Architecture for (M)LLMs

Python 3,083 219 Updated Apr 11, 2024

OpenNLPLab / lightning-attention

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Python 298 22 Updated Feb 23, 2025

NTSocks / ntsocks

[ACM CoNEXT22 Best Paper Award] NTSocks: An ultra-low latency and compatible PCIe interconnect for rack-scale disaggregation.

C 38 3 Updated Jul 11, 2024

2D84

Starred topics

Natural language processing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lu Junhao shuxiaobo

Achievements

Achievements

Block or report shuxiaobo

Starred repositories

OpenBMB / MiniCPM

pytorch / torchtitan

facebookresearch / llm-transparency-tool

seal-rg / recurrent-pretraining

Doraemonzzz / xopes

sgl-project / sglang

huggingface / optimum-quanto

vllm-project / llm-compressor

tech-srl / RASP

yuezhouhu / 2by4-pretrain

IST-DASLab / sparsegpt

NVlabs / MaskLLM

locuslab / wanda

aojunzz / NM-sparsity

JT-Ushio / MHA2MLA

laekov / fastmoe

withinmiaov / A-Survey-on-Mixture-of-Experts-in-LLMs

mdy666 / mdy_triton

RUCAIBox / Slow_Thinking_with_LLMs

fla-org / native-sparse-attention

deepseek-ai / DualPipe

deepseek-ai / DeepGEMM

XunhaoLai / native-sparse-attention-triton

deepseek-ai / open-infra-index

deepseek-ai / DeepEP

microsoft / mup

microsoft / mutransformers

microsoft / torchscale

OpenNLPLab / lightning-attention

NTSocks / ntsocks

Starred topics

Natural language processing