- Seoul, Republic of Korea
Highlights
Stars
BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library
This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
A high-throughput and memory-efficient inference and serving engine for LLMs
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
A curated list of Large Language Model resources, covering model training, serving, fine-tuning, and building LLM applications.
Official inference framework for 1-bit LLMs
An open-source RAG-based tool for chatting with your documents.
The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for float…
INACTIVE - http://mzl.la/ghe-archive - FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
TinyChatEngine: On-Device LLM Inference Library
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
Accessible large language models via k-bit quantization for PyTorch.
Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.
High-efficiency floating-point neural network inference operators for mobile, server, and Web