8000 3outeille (Ferdinand Mom) / Starred Β· GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View 3outeille's full-sized avatar
🎯
https://www.youtube.com/watch?v=VYPi0qcHWvQ&ab_channel=ABANIMETION
🎯
https://www.youtube.com/watch?v=VYPi0qcHWvQ&ab_channel=ABANIMETION

Organizations

@huggingface

Block or report 3outeille

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Quirky Assortment of CuTe Kernels

Python 268 17 Updated Jul 12, 2025

Simple MPI implementation for prototyping or learning

C 262 9 Updated Jun 27, 2025

Experimental repository for research implementation of NoLoCo.

Python 19 1 Updated Jun 15, 2025

AXI, AXI stream, Ethernet, and PCIe components in System Verilog

SystemVerilog 286 51 Updated Jun 18, 2025
Cuda 5 Updated May 30, 2025

Research sandbox for decentralized pipelined inference

Python 8 1 Updated May 13, 2025

Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse environments

Python 536 126 Updated Jul 12, 2025

prime-rl is a codebase for decentralized async RL training at scale

Python 366 52 Updated Jul 13, 2025

Scripts and instructions for replicating the original FineWeb experiments on LUMI

Shell 8 Updated Apr 25, 2025

Muon fsdp 2

Python 16 2 Updated Jul 12, 2025

Analyze computation-communication overlap in V3/R1.

1,076 144 Updated Mar 21, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,827 298 Updated Mar 10, 2025

Expert Parallelism Load Balancer

Python 1,231 194 Updated Mar 24, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,517 639 Updated Jul 2, 2025

Where GPUs get cooked πŸ‘©β€πŸ³πŸ”₯

Rust 237 12 Updated Mar 4, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,270 848 Updated Jul 11, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,862 280 Updated May 15, 2025

Multi-Threaded FP32 Matrix Multiplication on x86 CPUs

C 350 22 Updated Apr 21, 2025

A visual playground for agentic workflows: Iterate over your agents 10x faster

TypeScript 5,289 380 Updated Jul 6, 2025

Enable AI models for video production in the browser

TypeScript 1,900 226 Updated Jun 12, 2025

Fully open reproduction of DeepSeek-R1

Python 25,024 2,331 Updated Jul 10, 2025

πŸš€ Efficient implementations of state-of-the-art linear attention models

Python 2,895 217 Updated Jul 13, 2025

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 1,588 110 Updated Jul 7, 2025

DeMo: Decoupled Momentum Optimization

Python 189 9 Updated Dec 2, 2024

CGRA-Flow is an integrated framework for CGRA compilation, exploration, synthesis, and development.

Python 133 20 Updated Jun 17, 2025

A baseline repository of Auto-Parallelism in Training Neural Networks

Python 144 19 Updated Jun 25, 2022

Best practice for training LLaMA models in Megatron-LM

Python 657 57 Updated Jan 2, 2024

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Python 1,200 176 Updated Jul 7, 2025
Python 73 12 Updated May 27, 2025

Explore training for quantized models

Python 20 2 Updated Jul 12, 2025
Next
0