Stars
FlashMLA: Efficient MLA decoding kernels
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
PyTorch Tutorial for Deep Learning Researchers
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Dapr is a portable runtime for building distributed applications across cloud and edge, combining event-driven architecture with workflow orchestration.
精选机器学习,NLP,图像识别, 深度学习等人工智能领域学习资料,搜索,推荐,广告系统架构及算法技术资料整理。算法大牛笔记汇总
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
Dataframes powered by a multithreaded, vectorized query engine, written in Rust
A curated list of awesome parallel computing resources
C++ multi-dimensional labeled arrays and dataframe based on xtensor
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
oneAPI Threading Building Blocks (oneTBB)
Symbolic Expression and Statement Module for new DSLs
C++ Mathematical Expression Parser Benchmark
Performance-portable, length-agnostic SIMD with runtime dispatch
A collection of modern C++ libraries, include coro_http, coro_rpc, compile-time reflection, struct_pack, struct_json, struct_xml, struct_pb, easylog, async_simple etc.
C++11/14/17 std::expected with functional-style extensions
Probably the fastest coroutine lib in the world!