8000 ita9naiwa (Hyunsung Lee) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View ita9naiwa's full-sized avatar

Organizations

@octoml @iree-org

Block or report ita9naiwa

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,089 162 Updated Jul 29, 2023

ComfyUI plugin of Nunchaku

Python 1,637 49 Updated Jul 8, 2025

a small build system with a focus on speed

C++ 12,098 1,699 Updated Jul 11, 2025

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python 2,360 121 Updated Jul 13, 2025

An Infinitely Large Napkin

TeX 1,544 153 Updated Jul 8, 2025

A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS

196 9 Updated May 6, 2025

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 41,948 3,350 Updated Jul 12, 2025

PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily write your own.

Python 1,376 102 Updated Jul 12, 2025

IREE's PyTorch Frontend, based on Torch Dynamo.

Python 92 65 Updated Jul 12, 2025

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 3,218 725 Updated Jul 12, 2025

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems

Python 472 48 Updated Jul 11, 2025

The ultimate Vim configuration (vimrc)

Vim Script 31,365 7,317 Updated Oct 6, 2024

MLIR For Beginners tutorial

C++ 1,012 91 Updated Feb 7, 2025

Large Language Model Text Generation Inference

Python 10,317 1,207 Updated Jul 8, 2025

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 33,461 14,449 Updated Jul 12, 2025

Isolating mlir tutorial dialect implementation

C++ 20 4 Updated Apr 8, 2024

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

Python 1,384 176 Updated Jul 12, 2025
Python 4,020 378 Updated Jun 13, 2025

An extremely fast Python package and project manager, written in Rust.

Rust 61,171 1,753 Updated Jul 12, 2025

PyTorch native post-training library

Python 5,325 652 Updated Jul 12, 2025

[TMLR 2024] Efficient Large Language Models: A Survey

1,183 97 Updated Jun 23, 2025

⏩ Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks

TypeScript 27,595 3,120 Updated Jul 12, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,994 1,578 Updated Jul 12, 2025

CUDA Kernel Benchmarking Library

Cuda 680 80 Updated Jul 11, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 3,353 375 Updated Jul 11, 2025

Tools for merging pretrained large language models.

Python 6,013 579 Updated Jun 19, 2025

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,013 560 Updated Apr 11, 2025
Jupyter Notebook 546 24 Updated Aug 23, 2024

llama INT4 cuda inference with AWQ

C++ 54 6 Updated Jan 20, 2025
Next
0