3outeille

🎯

https://www.youtube.com/watch?v=VYPi0qcHWvQ&ab_channel=ABANIMETION

Ferdinand Mom 3outeille

🎯

https://www.youtube.com/watch?v=VYPi0qcHWvQ&ab_channel=ABANIMETION

Research engineer

260 followers · 787 following

Achievements

x2 x2

Achievements

x2 x2

Organizations

Lists (5)

Sort

Stars

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 268 17 Updated Jul 12, 2025

Quentin-Anthony / nanoMPI

Simple MPI implementation for prototyping or learning

C 262 9 Updated Jun 27, 2025

gensyn-ai / noloco

Experimental repository for research implementation of NoLoCo.

Python 19 1 Updated Jun 15, 2025

fpganinja / taxi

AXI, AXI stream, Ethernet, and PCIe components in System Verilog

SystemVerilog 286 51 Updated Jun 18, 2025

arjundevraj / stragglar

Cuda 5 Updated May 30, 2025

PrimeIntellect-ai / prime-pipeline

Research sandbox for decentralized pipelined inference

Python 8 1 Updated May 13, 2025

NousResearch / atropos

Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse environments

Python 536 126 Updated Jul 12, 2025

PrimeIntellect-ai / prime-rl

prime-rl is a codebase for decentralized async RL training at scale

Python 366 52 Updated Jul 13, 2025

spyysalo / lumi-fineweb-replication

Scripts and instructions for replicating the original FineWeb experiments on LUMI

Shell 8 Updated Apr 25, 2025

samsja / muon_fsdp_2

Muon fsdp 2

Python 16 2 Updated Jul 12, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,076 144 Updated Mar 21, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,827 298 Updated Mar 10, 2025

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,231 194 Updated Mar 24, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,517 639 Updated Jul 2, 2025

huggingface / gpu-fryer

Where GPUs get cooked 👩‍🍳🔥

Rust 237 12 Updated Mar 4, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,270 848 Updated Jul 11, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,862 280 Updated May 15, 2025

salykova / sgemm.c

Multi-Threaded FP32 Matrix Multiplication on x86 CPUs

C 350 22 Updated Apr 21, 2025

PySpur-Dev / pyspur

A visual playground for agentic workflows: Iterate over your agents 10x faster

TypeScript 5,289 380 Updated Jul 6, 2025

fal-ai-community / video-starter-kit

Enable AI models for video production in the browser

TypeScript 1,900 226 Updated Jun 12, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,024 2,331 Updated Jul 10, 2025

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Python 2,895 217 Updated Jul 13, 2025

huggingface / picotron

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 1,588 110 Updated Jul 7, 2025

bloc97 / DeMo

DeMo: Decoupled Momentum Optimization

Python 189 9 Updated Dec 2, 2024

tancheng / CGRA-Flow

CGRA-Flow is an integrated framework for CGRA compilation, exploration, synthesis, and development.

Python 133 20 Updated Jun 17, 2025

ConnollyLeon / awesome-Auto-Parallelism

A baseline repository of Auto-Parallelism in Training Neural Networks

Python 144 19 Updated Jun 25, 2022

alibaba / Megatron-LLaMA

Forked from NVIDIA/Megatron-LM

Best practice for training LLaMA models in Megatron-LM

Python 657 57 Updated Jan 2, 2024

alibaba / Pai-Megatron-Patch

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Python 1,200 176 Updated Jul 7, 2025

VIA-Research / vTrain

Python 73 12 Updated May 27, 2025

gau-nernst / quantized-training

Explore training for quantized models

Python 20 2 Updated Jul 12, 2025

Ferdinand Mom 3outeille

Organizations

Lists (5)

CUDA

Epita-Image

Pruning

Quantization

tooling

Stars