Stars
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Transformer(Attention Is All You Need) Implementation in Pytorch
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Korean Streaming ASR(with Denoiser and Conformer CTC)
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
HeliosVirtualCockpit / Helios
Forked from BlueFinBima/Helios14Helios Distribution
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Provides an improved webinterface for use with ADS-B decoders readsb / dump1090-fa
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
Korean Grammar Correction Model based on LLM
A latent text-to-image diffusion model
Transcription, forced alignment, and audio indexing with OpenAI's Whisper
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
Code for the paper Hybrid Spectrogram and Waveform Source Separation
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Korean text normalization and language preparation package for LM in Kaldi-based ASR system
An unofficial PyTorch implementation of the audio LM VALL-E
easy-to-use implementation of the ISMIR 2013 Audio Degradation Toolbox
Conformer-based Metric GAN for speech enhancement
Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.
Original transformer paper: Implementation of Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector