Aneureka

😇

learning CUDA, MLIR and LLM

Hiki Aneureka

😇

learning CUDA, MLIR and LLM

Software Engineer at cuDNN · @NVIDIA

233 followers · 49 following

NVIDIA
Shanghai, China
15:47 (UTC +08:00)
https://www.aneureka.com
@aneureka

Achievements

Highlights

Developer Program Member

Organizations

Lists (1)

Sort

Stars of stars

7 repositories

Stars

10000

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 285 17 Updated Jul 12, 2025

HW-whistleblower / True-Story-of-Pangu

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

10,988 1,353 Updated Jul 9, 2025

Evian-Zhang / llvm-ir-tutorial

LLVM IR入门指南

LLVM 1,433 156 Updated Jan 31, 2024

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers.

Python 2,765 154 Updated Jun 14, 2025

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 33,479 14,454 Updated Jul 14, 2025

NVIDIA / cuda-python

CUDA Python: Performance meets Productivity

Python 2,827 177 Updated Jul 12, 2025

digoal / blog

Opensource,Database,AI,Business,Minds. git clone --depth 1 https://github.com/digoal/blog

PLpgSQL 8,284 1,903 Updated Jul 14, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,356 376 Updated Jul 14, 2025

hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler

C++ 5,779 263 Updated Jul 6, 2025

ademeure / DeeperGEMM

Forked from deepseek-ai/DeepGEMM

DeeperGEMM: crazy optimized version

Cuda 69 Updated May 5, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,519 640 Updated Jul 2, 2025

pranjalssh / fast.cu

Fastest kernels written from scratch

Cuda 290 39 Updated Apr 3, 2025

bertmaher / simplegemm

Cuda 110 16 Updated Mar 17, 2025

Neargye / nameof

Nameof operator for modern C++, simply obtain the name of a variable, type, function, macro, and enum

C++ 2,215 117 Updated Oct 14, 2024

boostorg / pfr

std::tuple like methods for user defined types without any macro or boilerplate code

C++ 1,404 163 Updated Jun 28, 2025

jiachenzhu / DyT

Code release for DynamicTanh (DyT)

Python 981 81 Updated Mar 30, 2025

MitalAshok / self_macro

A C++14 macro to get the type of the current class without naming it

C++ 25 2 Updated May 25, 2024

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,650 875 Updated Apr 29, 2025

clice-project / clice

A modern, powerful, and user-friendly C++ language server built from scratch

C++ 691 38 Updated Jul 13, 2025

j2kun / mlir-tutorial

MLIR For Beginners tutorial

C++ 1,013 91 Updated Feb 7, 2025

NVIDIA / cudnn-frontend

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 592 121 Updated Jul 9, 2025

srush / GPU-Puzzles

Solve puzzles. Learn CUDA.

Jupyter Notebook 11,264 855 Updated Sep 1, 2024

ScalingIntelligence / KernelBench

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems

Python 476 48 Updated Jul 14, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 52,170 8,683 Updated Jul 14, 2025

mli / paper-reading

深度学习经典、新论文逐段精读

30,764 2,674 Updated Mar 22, 2025

gpakosz / .tmux

Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍

Shell 23,215 3,523 Updated Jul 1, 2025

adam-maj / tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 8,576 664 Updated Aug 18, 2024

clangd / vscode-clangd

Visual Studio Code extension for clangd

TypeScript 712 134 Updated Jul 1, 2025

Jack47 / hack-SysML

The road to hack SysML and become an system expert

Emacs Lisp 492 62 Updated Sep 25, 2024

CatMe0w / zouxian

Permanent Apple Intelligence + Xcode Predictive Code Completion for Chinese-market Mac computers

Shell 903 40 Updated Feb 28, 2025