8000 topenkoff (Denis Kayshev) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View topenkoff's full-sized avatar
💩
Zm9yayB1
💩
Zm9yayB1

Organizations

@hit-box

Block or report topenkoff

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 3,843 385 Updated Jul 6, 2025

This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mastering CUDA programming. Whether you're just starting or look…

356 32 Updated Feb 22, 2025

Perforator is a cluster-wide continuous profiling tool designed for large data centers

C++ 3,225 146 Updated Jul 6, 2025

collection of benchmarks to measure basic GPU capabilities

C++ 387 56 Updated Feb 11, 2025

CUDA on non-NVIDIA GPUs

Rust 12,006 755 Updated Jul 4, 2025

🚴 Call stack profiler for Python. Shows you why your code is slow!

Python 7,186 246 Updated Jul 2, 2025

Frame profiler

C++ 12,207 823 Updated Jul 4, 2025

LLM training in simple, raw C/CUDA

Cuda 27,074 3,113 Updated Jun 26, 2025

Dynamic Memory Management for Serving LLMs without PagedAttention

C 400 31 Updated May 30, 2025

CUDA Kernel Benchmarking Library

Cuda 676 79 Updated Jul 4, 2025

Unified Collective Communication Library

C 259 110 Updated Jul 2, 2025

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

C++ 413 56 Updated Jul 2, 2025
8000

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

Rust 389 40 Updated Jul 1, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 835 38 Updated Jun 5, 2025

Nvidia Instruction Set Specification Generator

Python 280 12 Updated Jul 9, 2024

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 587 121 Updated Jun 12, 2025

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 15,038 1,272 Updated May 23, 2024

System design patterns for machine learning

2,676 312 Updated Oct 7, 2021

High performance server-side application framework

C++ 8,735 1,620 Updated Jul 6, 2025

Safe rust wrapper around CUDA toolkit

Rust 867 106 Updated Jun 27, 2025

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 9,867 930 Updated Jul 4, 2025

Minimalist ML framework for Rust

Rust 17,541 1,138 Updated Jun 27, 2025

High-performance containers and utilities for concurrent and asynchronous programming

Rust 425 26 Updated May 27, 2025

Asynchronous Programming in Rust, published by Packt

Rust 240 71 Updated Dec 10, 2024

High-level, optionally asynchronous Rust bindings to llama.cpp

Rust 222 37 Updated Jun 5, 2024

A heap memory profiler for Linux

C++ 3,690 225 Updated Jun 30, 2025

Code samples for the book Patterns in C

C 184 58 Updated Apr 18, 2016

“Zero setup” cross compilation and “cross testing” of Rust crates

Rust 7,484 415 Updated May 24, 2025

ix package manager, statically build packages, for darwin/linux, with clang

Shell 157 18 Updated Jul 2, 2025
Next
0