Stars
Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
SALMONN: Speech Audio Language Music Open Neural Network
The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.
Official Implementation of EnCLAP (ICASSP 2024)
Code for paper "LLMs Can Evolve Continually on Modality for X-Modal Reasoning" NeurIPS2024
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Open source platform for the machine learning lifecycle
A Collection of BM25 Algorithms in Python
Expand, Highlight, Generate: RL-driven Document Generation for Passage Reranking. Accepted at main track of EMNLP 2023
A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
Official inference framework for 1-bit LLMs
Contriever: Unsupervised Dense Information Retrieval with Contrastive Learning
Analysing the Robustness of Dual Encoders for Dense Retrieval Against Misspellings
Typo-Robust Sentence Representation Learning for Dense Retrieval
State-of-the-Art Text Embeddings
A tool for extracting plain text from Wikipedia dumps
[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models
Aligning pretrained language models with instruction data generated by themselves.
Decoupling Reasoning from Observations for Efficient Augmented Language Models
AudioBench: A Universal Benchmark for Audio Large Language Models