8000 knote2019 / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View knote2019's full-sized avatar
Working from home
Working from home

Block or report knote2019

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 1,139 160 Updated Jun 5, 2025

Fast C++ IPC using shared memory

C++ 587 84 Updated Aug 15, 2022

IPC is a C++ library that provides inter-process communication using shared memory on Windows. A .NET wrapper is available which allows interaction with C++ as well.

C++ 498 122 Updated Aug 31, 2022

The most over-engineered C++ assertion library

C++ 1 Updated May 28, 2025

Simple, portable, and self-contained stacktrace library for C++11 and newer

C++ 1 Updated Jun 3, 2025

A concept-centered standard library for C++20, enabling safer and more reliable products and a more modern feel for C++ code; Also home of Subdoc the code-documentation generator.

C++ 2 Updated May 28, 2025

LLM implementation one matrix multiplication at a time

Jupyter Notebook 12 4 Updated Aug 8, 2024

Simple and easy to understand PyTorch implementation of Large Language Model (LLM) GPT and LLAMA from scratch with detailed steps. Implemented: Byte-Pair Tokenizer, Rotational Positional Embedding …

Python 4 Updated Nov 18, 2024

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 3,537 307 Updated Jun 25, 2025

Conversion to/from half-precision floating point formats

C++ 354 97 Updated Jul 31, 2024

FlatBuffers: Memory Efficient Serialization Library

C++ 24,378 3,351 Updated Jun 25, 2025

Acceleration package for neural networks on multi-core CPUs

C 1,688 316 Updated Jun 11, 2024

High-efficiency floating-point neural network inference operators for mobile, server, and Web

C 2,052 425 Updated Jun 25, 2025

Portable (POSIX/Windows/Emscripten) thread pool for C/C++

C++ 371 147 Updated Jun 16, 2024

The OpenTelemetry C++ Client

C++ 1,071 484 Updated Jun 24, 2025

A microbenchmark support library

C++ 9,579 1,690 Updated Jun 12, 2025

Collective communications library with various primitives for multi-machine training.

C++ 1,318 331 Updated Jun 17, 2025

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,071 160 Updated Jul 29, 2023

A C library that may be linked into a C/C++ program to produce symbolic backtraces

C 1,076 246 Updated Apr 10, 2025

Universal cross-platform tokenizers binding to HF and sentencepiece

C++ 349 86 Updated Jun 25, 2025

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.

HTML 821 189 Updated Jun 24, 2025
C++ 29 Updated Feb 3, 2025

所有小初高、大学PDF教材。

Roff 41,160 9,128 Updated May 18, 2025

A C++ header-only HTTP/HTTPS server and client library

C++ 14,503 2,443 Updated Jun 24, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.

Python 4,157 287 Updated Jun 20, 2025

Single header process & system information library. Written in C++17.

C++ 7 Updated Nov 16, 2020

C++ IPC Library: A high-performance inter-process communication using shared memory on Linux/Windows.

C++ 1,984 365 Updated May 24, 2025

A minimal docker baseimage to ease creation of X graphical application containers

Shell 1,382 199 Updated Jun 25, 2025

A simple wrapper of FA for C++

C++ 1 Updated Dec 22, 2024
Next
0