Lists (1)
Sort Name ascending (A-Z)
Stars
This is the official repo for paper DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility.
Awesome RL Reasoning Recipes ("Triple R")
Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models
Large Concept Models: Language modeling in a sentence representation space
Vector (and Scalar) Quantization, in Pytorch
🏆🏅 Repository for the GEB team's winning solutions in the IEEE Hybrid Energy Forecasting and Trading Competition (HEFTCom).
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
A curated list of speaker-embedding speaker-verification, speaker-identification resources.
AcademiCodec: An Open Source Audio Codec Model for Academic Research
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
The MOS system combines components from DNSMOS, NISQA, MOSSSL, and SIGMOS, using the librosa library to process audio waveforms.
A Python library for computing the Mel-Cepstral Distance (Mel-Cepstral Distortion, MCD) between two inputs. This implementation is based on the method proposed by Robert F. Kubichek in "Mel-Cepstra…
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
An open source implementation of CLIP.
Comparative Analysis of Deep Learning Approaches for Facial Age Estimation. Accepted to CVPR 2024
Collection of self-supervised models for speaker and language recognition tasks.
Code for ACL 2021 paper "ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information"
Official Code implementation for the ICLR paper "LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading"
Clone a voice in 5 seconds to generate arbitrary speech in real-time
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)