Stars
Optimized FP16/BF16 x FP4 GPU kernels for AMD GPUs
a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA environments.
collection of benchmarks to measure basic GPU capabilities
Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?
fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。
Interact with your documents using the power of GPT, 100% privately, no data leaks
A platform for building proxies to bypass network restrictions.
CalcProgrammer1 / NVFC
Forked from graphitemaster/NVFCOpenSource tool for monitoring, configuring and overclocking NVIDIA GPUs
An unofficial cuda assembler, for all generations of SASS, hopefully :)
Tools for people envious of nvidia's blob driver.
Mythril is a symbolic-execution-based securty analysis tool for EVM bytecode. It detects security vulnerabilities in smart contracts built for Ethereum and other EVM-compatible blockchains.
SQL-based streaming analytics platform at scale
Official repository of the AWS EC2 FPGA Hardware and Software Development Kit
Beringei is a high performance, in-memory storage engine for time series data.
A curated list of Deep Learning hardware, cycle/memory optimisation techniques
A pure front-end web UI for you-know-which bbs.