Stars
Cadence is a distributed, scalable, durable, and highly available orchestration engine to execute asynchronous long-running business logic in a scalable and resilient way.
Here, I implement every single component in typical LLM architectures from scratch: from data preparation to multihead self attention modules to instruction fine tuning of open source models!
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi models are the most capable and cost-effective small langua…
Official Implementation (Pytorch) of "Inversion-based Latent Bayesian Optimization", NeurIPS 2024
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
This repository houses supplementary code and Jupyter notebooks that accompany the AI Pocket Reference project.
FlashInfer: Kernel Library for LLM Serving
Making large AI models cheaper, faster and more accessible
A streamlined reference manual for AI practitioners, students, and developers to quickly look up core concepts and implementations.
Fast and memory-efficient exact attention
🤗 smolagents: a barebones library for agents that think in code.
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents