8000 topenkoff (Denis Kayshev) / Starred · GitHub

More Web Proxy on the site http://driver.im/

topenkoff

Follow

💩

Zm9yayB1

Denis Kayshev topenkoff

💩

Zm9yayB1

Follow

8 followers · 7 following

Moscow
https://kayshev.com

Achievements

Achievements

Highlights

Developer Program Member

Organizations

Lists (3)

Sort

ml

14 repositories

study

18 repositories

tools

Stars

vllm-project / aibrix

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 3,843 385 Updated Jul 6, 2025

rkinas / cuda-learning

This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mastering CUDA programming. Whether you're just starting or look…

356 32 Updated Feb 22, 2025

yandex / perforator

Perforator is a cluster-wide continuous profiling tool designed for large data centers

C++ 3,225 146 Updated Jul 6, 2025

RRZE-HPC / gpu-benches

collection of benchmarks to measure basic GPU capabilities

C++ 387 56 Updated Feb 11, 2025

vosen / ZLUDA

CUDA on non-NVIDIA GPUs

Rust 12,006 755 Updated Jul 4, 2025

joerick / pyinstrument

🚴 Call stack profiler for Python. Shows you why your code is slow!

Python 7,186 246 Updated Jul 2, 2025

wolfpld / tracy

Frame profiler

C++ 12,207 823 Updated Jul 4, 2025

ColfaxResearch / cutlass-kernels

Cuda 214 33 Updated Jul 11, 2024

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 27,074 3,113 Updated Jun 26, 2025

microsoft / vattention

Dynamic Memory Management for Serving LLMs without PagedAttention

C 400 31 Updated May 30, 2025

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 676 79 Updated Jul 4, 2025

openucx / ucc

Unified Collective Communication Library

C 259 110 Updated Jul 2, 2025

NVIDIA / NVTX

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

C++ 413 56 Updated Jul 2, 2025

8000

EricLBuehler / candle-vllm

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

Rust 389 40 Updated Jul 1, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 835 38 Updated Jun 5, 2025

kuterd / nv_isa_solver

Nvidia Instruction Set Specification Generator

Python 280 12 Updated Jul 9, 2024

NVIDIA / cudnn-frontend

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 587 121 Updated Jun 12, 2025

naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 15,038 1,272 Updated May 23, 2024

mercari / ml-system-design-pattern

System design patterns for machine learning

2,676 312 Updated Oct 7, 2021

scylladb / seastar

High performance server-side application framework

C++ 8,735 1,620 Updated Jul 6, 2025

coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

Rust 867 106 Updated Jun 27, 2025

huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 9,867 930 Updated Jul 4, 2025

huggingface / candle

Minimalist ML framework for Rust

Rust 17,541 1,138 Updated Jun 27, 2025

wvwwvwwv / scalable-concurrent-containers

High-performance containers and utilities for concurrent and asynchronous programming

Rust 425 26 Updated May 27, 2025

PacktPublishing / Asynchronous-Programming-in-Rust

Asynchronous Programming in Rust, published by Packt

Rust 240 71 Updated Dec 10, 2024

edgenai / llama_cpp-rs

High-level, optionally asynchronous Rust bindings to llama.cpp

Rust 222 37 Updated Jun 5, 2024

KDE / heaptrack

A heap memory profiler for Linux

C++ 3,690 225 Updated Jun 30, 2025

adamtornhill / PatternsInC

Code samples for the book Patterns in C

C 184 58 Updated Apr 18, 2016

cross-rs / cross

“Zero setup” cross compilation and “cross testing” of Rust crates

Rust 7,484 415 Updated May 24, 2025

stal-ix / ix

ix package manager, statically build packages, for darwin/linux, with clang

Shell 157 18 Updated Jul 2, 2025

0