Stars
Modified K-means Algorithm with Local Optimality Guarantees (ICML 2025)
Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
Parse and disassemble .metallib files in browser
Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)
A Python package for optimal 1D k-means clustering.
Code repository for ICLR 2025 paper "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid"
Official Implementation of "KBLaM: Knowledge Base augmented Language Model"
Local Deep Research is an AI-powered assistant that transforms complex questions into comprehensive, cited reports by conducting iterative analysis using any LLM across diverse knowledge sources in…
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit".
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
A machine learning software for extracting information from scholarly documents
A collection of app themes based on some Nostromo UI from Alien.
Mini-V is a compact core-xy printer with a build volume of 180mm³ using 2020 extrusions. Inspired to be a mini-Voron.
Virtual whiteboard for sketching hand-drawn like diagrams
Entropy Based Sampling and Parallel CoT Decoding
A monospaced pixel font with a lo-fi, techy vibe
Exploring the scalable matrix extension of the Apple M4 processor
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
Code execution exploit for Tony Hawk's video game series
The homepage of OneBit model quantization framework.
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
A collection of tricks and tools to speed up transformer models