8000 Aneureka (Hiki) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View Aneureka's full-sized avatar
😇
learning CUDA, MLIR and LLM
😇
learning CUDA, MLIR and LLM

Organizations

@Leftovers4

Block or report Aneureka

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Quirky Assortment of CuTe Kernels

Python 285 17 Updated Jul 12, 2025

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

10,988 1,353 Updated Jul 9, 2025

LLVM IR入门指南

LLVM 1,433 156 Updated Jan 31, 2024

A course of learning LLM inference serving on Apple Silicon for systems engineers.

Python 2,765 154 Updated Jun 14, 2025

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 33,479 14,454 Updated Jul 14, 2025

CUDA Python: Performance meets Productivity

Python 2,827 177 Updated Jul 12, 2025

Opensource,Database,AI,Business,Minds. git clone --depth 1 https://github.com/digoal/blog

PLpgSQL 8,284 1,903 Updated Jul 14, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 3,356 376 Updated Jul 14, 2025

A personal experimental C++ Syntax 2 -> Syntax 1 compiler

C++ 5,779 263 Updated Jul 6, 2025

DeeperGEMM: crazy optimized version

Cuda 69 Updated May 5, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,519 640 Updated Jul 2, 2025

Fastest kernels written from scratch

Cuda 290 39 Updated Apr 3, 2025
Cuda 110 16 Updated Mar 17, 2025

Nameof operator for modern C++, simply obtain the name of a variable, type, function, macro, and enum

C++ 2,215 117 Updated Oct 14, 2024

std::tuple like methods for user defined types without any macro or boilerplate code

C++ 1,404 163 Updated Jun 28, 2025

Code release for DynamicTanh (DyT)

Python 981 81 Updated Mar 30, 2025

A C++14 macro to get the type of the current class without naming it

C++ 25 2 Updated May 25, 2024

FlashMLA: Efficient MLA decoding kernels

Cuda 11,650 875 Updated Apr 29, 2025

A modern, powerful, and user-friendly C++ language server built from scratch

C++ 691 38 Updated Jul 13, 2025

MLIR For Beginners tutorial

C++ 1,013 91 Updated Feb 7, 2025

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 592 121 Updated Jul 9, 2025

Solve puzzles. Learn CUDA.

Jupyter Notebook 11,264 855 Updated Sep 1, 2024

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems

Python 476 48 Updated Jul 14, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 52,170 8,683 Updated Jul 14, 2025

深度学习经典、新论文逐段精读

30,764 2,674 Updated Mar 22, 2025

Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍

Shell 23,215 3,523 Updated Jul 1, 2025

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 8,576 664 Updated Aug 18, 2024

Visual Studio Code extension for clangd

TypeScript 712 134 Updated Jul 1, 2025

The road to hack SysML and become an system expert

Emacs Lisp 492 62 Updated Sep 25, 2024

Permanent Apple Intelligence + Xcode Predictive Code Completion for Chinese-market Mac computers

Shell 903 40 Updated Feb 28, 2025
Next
0