-
Intel
- Shanghai
-
14:20
(UTC +08:00)
Lists (1)
Sort Name ascending (A-Z)
Stars
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Real time interactive streaming digital human
Open Source framework for voice and multimodal conversational AI
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
FlashInfer: Kernel Library for LLM Serving
Fast inference from large lauguage models via speculative decoding
An Application Framework for AI Engineering
[CVPR 2025] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
LLaVA-CoT, a visual language model capable of spontaneous, systematic reasoning
Development repository for the Triton language and compiler
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Evaluation, benchmark, and scorecard, targeting for performance on throughput and latency, accuracy on popular evaluation harness, safety, and hallucination
GenAI components at micro-service level; GenAI service composer to create mega-service
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Standardized Serverless ML Inference Platform on Kubernetes
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…
Noise reduction in python using spectral gating (speech, bioacoustics, audio, time-domain signals)
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Contains the source code examples described in the "Intel® 64 and IA-32 Architectures Optimization Reference Manual"
A curated list of neural network pruning resources.
Awesome machine learning model compression research papers, quantization, tools, and learning material.