Lists (1)
Sort Name ascending (A-Z)
Stars
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
IPC is a C++ library that provides inter-process communication using shared memory on Windows. A .NET wrapper is available which allows interaction with C++ as well.
cjdb / libassert
Forked from jeremy-rifkin/libassertThe most over-engineered C++ assertion library
cjdb / cpptrace
Forked from jeremy-rifkin/cpptraceSimple, portable, and self-contained stacktrace library for C++11 and newer
cjdb / subspace
Forked from chromium/subspaceA concept-centered standard library for C++20, enabling safer and more reliable products and a more modern feel for C++ code; Also home of Subdoc the code-documentation generator.
LLM implementation one matrix multiplication at a time
Simple and easy to understand PyTorch implementation of Large Language Model (LLM) GPT and LLAMA from scratch with detailed steps. Implemented: Byte-Pair Tokenizer, Rotational Positional Embedding …
The simplest, fastest repository for training/finetuning small-sized VLMs.
Conversion to/from half-precision floating point formats
FlatBuffers: Memory Efficient Serialization Library
Acceleration package for neural networks on multi-core CPUs
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Portable (POSIX/Windows/Emscripten) thread pool for C/C++
Collective communications library with various primitives for multi-machine training.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
A C library that may be linked into a C/C++ program to produce symbolic backtraces
Universal cross-platform tokenizers binding to HF and sentencepiece
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
A C++ header-only HTTP/HTTPS server and client library
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.
Single header process & system information library. Written in C++17.
C++ IPC Library: A high-performance inter-process communication using shared memory on Linux/Windows.
A minimal docker baseimage to ease creation of X graphical application containers