Lists (1)
Sort Name ascending (A-Z)
Stars
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
MoE-Visualizer is a tool designed to visualize the selection of experts in Mixture-of-Experts (MoE) models.
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
A curated list for Efficient Large Language Models
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
Awesome Reasoning LLM Tutorial/Survey/Guide
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient MLA decoding kernels
A high-throughput and memory-efficient inference and serving engine for LLMs
Retrieval and Retrieval-augmented LLMs
Finetune Qwen3, Llama 4, TTS, DeepSeek-R1 & Gemma 3 LLMs 2x faster with 70% less memory! 🦥
Fully open reproduction of DeepSeek-R1
Code, documentation, and discussion around the MIMIC-CXR database
CryptoNets is a demonstration of the use of Neural-Networks over data encrypted with Homomorphic Encryption. Homomorphic Encryptions allow performing operations such as addition and multiplication …
LLM training code for Databricks foundation models
A framework for the evaluation of autoregressive code generation language models.
A framework for few-shot evaluation of language models.
The missing star history graph of GitHub repos - https://star-history.com
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.
Awesome LLM compression research papers and tools.