-
NVIDIA
- Shanghai, China
-
15:47
(UTC +08:00) - https://www.aneureka.com
- @aneureka
Highlights
Lists (1)
Sort Name ascending (A-Z)
Stars
A course of learning LLM inference serving on Apple Silicon for systems engineers.
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
CUDA Python: Performance meets Productivity
Opensource,Database,AI,Business,Minds. git clone --depth 1 https://github.com/digoal/blog
FlashInfer: Kernel Library for LLM Serving
A personal experimental C++ Syntax 2 -> Syntax 1 compiler
ademeure / DeeperGEMM
Forked from deepseek-ai/DeepGEMMDeeperGEMM: crazy optimized version
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Nameof operator for modern C++, simply obtain the name of a variable, type, function, macro, and enum
std::tuple like methods for user defined types without any macro or boilerplate code
A C++14 macro to get the type of the current class without naming it
FlashMLA: Efficient MLA decoding kernels
A modern, powerful, and user-friendly C++ language server built from scratch
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
A high-throughput and memory-efficient inference and serving engine for LLMs
Oh my tmux! My self-contained, pretty & versatile tmux configuration made with 💛🩷💙🖤❤️🤍
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Visual Studio Code extension for clangd
The road to hack SysML and become an system expert
Permanent Apple Intelligence + Xcode Predictive Code Completion for Chinese-market Mac computers