Stars
Gradient Reversal Layer for Domain Adaptation
A PyTorch native platform for training generative AI models
A low-bitrate single-codebook 16 kHz speech codec based on focal modulation
A lightweight data processing framework built on DuckDB and 3FS.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS
🤗 smolagents: a barebones library for agents that think in code.
python bindings for symphonia/opus - read various audio formats from python and write opus files
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
Awesome speech/audio LLMs, representation learning, and codec models
Helpful tools and examples for working with flex-attention
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
A tool for transcribing orthographic text as IPA (International Phonetic Alphabet)
A simple and efficient Mamba implementation in pure PyTorch and MLX.
Inference and training library for high-quality TTS models.
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Manage scalable open LLM inference endpoints in Slurm clusters
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/