8000 sugsugsug (UGyeong Song) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View sugsugsug's full-sized avatar
  • SNUCSE 18
  • Seoul, South Korea

Block or report sugsugsug

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Accommodating Large Language Model Training over Heterogeneous Environment.

Python 20 8 Updated Mar 13, 2025

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 32,382 13,455 Updated May 14, 2025

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 411 46 Updated Apr 19, 2025
Python 7 Updated Dec 10, 2024

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 2,069 352 Updated Mar 24, 2025

This repository is established to store personal notes and annotated papers during daily research.

122 8 Updated Apr 22, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 38,333 4,365 Updated May 14, 2025

[ATC '24] Metis: Fast automatic distributed training on heterogeneous GPUs (https://www.usenix.org/conference/atc24/presentation/um)

Python 26 15 Updated Nov 18, 2024

Awesome Papers related to Mamba.

1,358 69 Updated Oct 17, 2024

🔥Highlighting the top ML papers every week.

11,228 686 Updated Apr 11, 2025

[Mamba-Survey-2024] Paper list for State-Space-Model/Mamba and it's Applications

710 41 Updated Mar 17, 2025

Mamba SSM architecture

Python 14,863 1,299 Updated May 9, 2025

Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.

Python 63 8 Updated Mar 20, 2025

Training and serving large-scale neural networks with auto parallelization.

Python 3,131 359 Updated Dec 9, 2023

Awesome-LLM-KV-Cache: A curated list of 📙Awesome LLM KV Cache Papers with Codes.

295 18 Updated Mar 3, 2025

Hands-On GPU Programming with Python and CUDA, published by Packt

Python 380 170 Updated Aug 10, 2024

Fast CUDA matrix multiplication from scratch

Cuda 713 107 Updated Dec 28, 2023

NNtrainer is Software Framework for Training Neural Network Models on Devices.

C++ 1 Updated Dec 18, 2024

NNtrainer is Software Framework for Training Neural Network Models on Devices.

C++ 155 84 Updated May 12, 2025

Welcome to Tizen .NET

C# 230 32 Updated Apr 18, 2025

TCP proxy in ANSI C

C 395 153 Updated Oct 12, 2024

Large Language Model (LLM) Systems Paper List

1,221 68 Updated May 10, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism etc.

Python 3,994 277 Updated May 12, 2025

Study parallel programming - CUDA, OpenMP, MPI, Pthread

Cuda 56 14 Updated Jul 3, 2022

Transformer: PyTorch Implementation of "Attention Is All You Need"

Python 3,699 527 Updated Aug 6, 2024

implementation of TDConvED for video captioning

Python 13 1 Updated Mar 18, 2020

RoboGrammar: Graph Grammar for Terrain-Optimized Robot Design (SIGGRAPH Asia 2020)

C++ 211 66 Updated Jan 21, 2023

95.47% on CIFAR10 with PyTorch

Python 6,180 2,159 Updated Feb 24, 2023

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

Python 432 81 Updated May 15, 2023
Next
0