ita9naiwa

Hyunsung Lee ita9naiwa

最淡的墨水勝過最好的記憶

205 followers · 47 following

NVIDIA
Suwon, South Korea
in/hyunsung-lee-98965a255

Achievements

Organizations

Stars

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,089 162 Updated Jul 29, 2023

mit-han-lab / ComfyUI-nunchaku

ComfyUI plugin of Nunchaku

Python 1,637 49 Updated Jul 8, 2025

ninja-build / ninja

a small build system with a focus on speed

C++ 12,098 1,699 Updated Jul 11, 2025

mit-han-lab / nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python 2,360 121 Updated Jul 13, 2025

vEnhance / napkin

An Infinitely Large Napkin

TeX 1,544 153 Updated Jul 8, 2025

MekkCyber / CutlassAcademy

A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS

196 9 Updated May 6, 2025

unslothai / unsloth

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 41,948 3,350 Updated Jul 12, 2025

Lightning-AI / lightning-thunder

PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.

Python 1,376 102 Updated Jul 12, 2025

iree-org / iree-turbine

IREE's PyTorch Frontend, based on Torch Dynamo.

Python 92 65 Updated Jul 12, 2025

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 3,218 725 Updated Jul 12, 2025

colinyoyo26 / mlir-standalone-toy

LLVM 7 1 Updated May 15, 2021

ScalingIntelligence / KernelBench

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems

Python 472 48 Updated Jul 11, 2025

amix / vimrc

The ultimate Vim configuration (vimrc)

Vim Script 31,365 7,317 Updated Oct 6, 2024

j2kun / mlir-tutorial

MLIR For Beginners tutorial

C++ 1,012 91 Updated Feb 7, 2025

huggingface / text-generation-inference

Large Language Model Text Generation Inference

Python 10,317 1,207 Updated Jul 8, 2025

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 33,461 14,449 Updated Jul 12, 2025

jerinphilip / toy-mlir

Isolating mlir tutorial dialect implementation

C++ 20 4 Updated Apr 8, 2024

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

Python 1,384 176 Updated Jul 12, 2025

LLaVA-VL / LLaVA-NeXT

Python 4,020 378 Updated Jun 13, 2025

astral-sh / uv

An extremely fast Python package and project manager, written in Rust.

Rust 61,171 1,753 Updated Jul 12, 2025

pytorch / torchtune

PyTorch native post-training library

Python 5,325 652 Updated Jul 12, 2025

AIoT-MLSys-Lab / Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

1,183 97 Updated Jun 23, 2025

continuedev / continue

⏩ Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks

TypeScript 27,595 3,120 Updated Jul 12, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,994 1,578 Updated Jul 12, 2025

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 680 80 Updated Jul 11, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,353 375 Updated Jul 11, 2025

arcee-ai / mergekit

Tools for merging pretrained large language models.

Python 6,013 579 Updated Jun 19, 2025

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,013 560 Updated Apr 11, 2025

apoorvumang / prompt-lookup-decoding

Jupyter Notebook 546 24 Updated Aug 23, 2024

ankan-ban / llama_cu_awq

llama INT4 cuda inference with AWQ

C++ 54 6 Updated Jan 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyunsung Lee ita9naiwa

Achievements

Achievements

Organizations

Block or report ita9naiwa

Stars

Liu-xiandong / How_to_optimize_in_GPU

mit-han-lab / ComfyUI-nunchaku

ninja-build / ninja

mit-han-lab / nunchaku

vEnhance / napkin

MekkCyber / CutlassAcademy

unslothai / unsloth

Lightning-AI / lightning-thunder

iree-org / iree-turbine

iree-org / iree

colinyoyo26 / mlir-standalone-toy

ScalingIntelligence / KernelBench

amix / vimrc

j2kun / mlir-tutorial

huggingface / text-generation-inference

llvm / llvm-project

jerinphilip / toy-mlir

SafeAILab / EAGLE

LLaVA-VL / LLaVA-NeXT

astral-sh / uv

pytorch / torchtune

AIoT-MLSys-Lab / Efficient-LLMs-Survey

continuedev / continue

NVIDIA / TensorRT-LLM

NVIDIA / nvbench

flashinfer-ai / flashinfer

arcee-ai / mergekit

pytorch-labs / gpt-fast

apoorvumang / prompt-lookup-decoding

ankan-ban / llama_cu_awq