Stars
✨✨VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
A Conversational Speech Generation Model
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Data manipulation and transformation for audio signal processing, powered by PyTorch
A TTS model capable of generating ultra-realistic dialogue in one pass.
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
A text-to-speech (TTS), speech-to-text (STT) and speech-to-speech (STS) library built on Apple's MLX framework, providing efficient speech analysis on Apple Silicon.
Dataset of dry/wet pairs for audio effects research
Framework for differentiable black-box and gray-box audio effects modeling
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch
Vector (and Scalar) Quantization, in Pytorch
Inspired by "Neural Networks Fail to Learn Periodic Functions and How to Fix It"
A large-scale dataset of caption-annotated MIDI files.
A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.
ModernBERT model optimized for Apple Neural Engine.
A full collection of Music Informatic Retrieval (MIR) and AI Music labs.