-
SNUCSE 18
- Seoul, South Korea
Starred repositories
Accommodating Large Language Model Training over Heterogeneous Environment.
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Latency and Memory Analysis of Transformer Models for Training and Inference
deepspeedai / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2
This repository is established to store personal notes and annotated papers during daily research.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
[ATC '24] Metis: Fast automatic distributed training on heterogeneous GPUs (https://www.usenix.org/conference/atc24/presentation/um)
🔥Highlighting the top ML papers every week.
[Mamba-Survey-2024] Paper list for State-Space-Model/Mamba and it's Applications
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
Training and serving large-scale neural networks with auto parallelization.
Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.
Hands-On GPU Programming with Python and CUDA, published by Packt
2024-2-CID-TEAM-A / nntrainer
Forked from nnstreamer/nntrainerNNtrainer is Software Framework for Training Neural Network Models on Devices.
NNtrainer is Software Framework for Training Neural Network Models on Devices.
Large Language Model (LLM) Systems Paper List
📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.
Study parallel programming - CUDA, OpenMP, MPI, Pthread
Transformer: PyTorch Implementation of "Attention Is All You Need"
implementation of TDConvED for video captioning
RoboGrammar: Graph Grammar for Terrain-Optimized Robot Design (SIGGRAPH Asia 2020)
Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.