ishine

ishine

speech asr/speech-recognition tts/text-to-speech vc/voice-conversion ac/accent-conversion

153 followers · 217 following

gerzz.inc
shanghai
dubbing-ai.com dubbingai.io

Achievements

TTS.cpp Public
Forked from mmwillet/TTS.cpp

TTS support with GGML

C++ MIT License Updated Jun 19, 2025
finally_based_speech_enhancement Public
Forked from markunya/finally_based_speech_enhancement

Jupyter Notebook Updated Jun 18, 2025
linearvc Public
Forked from kamperh/linearvc

Voice conversion with just linear regression.

Jupyter Notebook MIT License Updated Jun 18, 2025
VietASR Public
Forked from zzasdf/VietASR

Python 1 Apache License 2.0 Updated Jun 18, 2025
Stream-Omni Public
Forked from ictnlp/Stream-Omni

Stream-Omni is an end-to-end language-vision-speech chatbot that simultaneously supports interaction across various modality combinations.

Python GNU General Public License v3.0 Updated Jun 18, 2025
icefall Public
Forked from k2-fsa/icefall

Python Apache License 2.0 Updated Jun 17, 2025
ArTST Public
Forked from mbzuai-nlp/ArTST

Python Updated Jun 17, 2025
tts_impl Public
Forked from uthree/tts_impl

implementation of text to speech models

Python MIT License Updated Jun 17, 2025
FBK-fairseq1 Public
Forked from hlt-mt/FBK-fairseq

Repository containing the open source code of works published at the FBK MT unit.

Python Other Updated Jun 16, 2025
SongGeneration Public
Forked from tencent-ailab/SongGeneration

Python Other Updated Jun 16, 2025
dasheng-glap Public
Forked from xiaomi-research/dasheng-glap

Official Implementation of GLAP - General Language Audio Pretraining

Python Apache License 2.0 Updated Jun 16, 2025
TS-ASR-Whisper Public
Forked from BUTSpeechFIT/TS-ASR-Whisper

Python Apache License 2.0 Updated Jun 16, 2025
X-Codec-2.0 Public
Forked from zhenye234/X-Codec-2.0

Codec for paper: LLaSA: Scaling Train Time and Test Time Compute for LLaMA based Speech Synthesis.

Python MIT License Updated Jun 16, 2025
A-DMA Public
Forked from ZhikangNiu/A-DMA

[INTERSPEECH 2025]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"

Python MIT License Updated Jun 16, 2025
noise-robust-asr Public
Forked from debanjan06/noise-robust-asr

🔊 Advanced Noise-Robust ASR System with Dynamic Adaptation Cutting-edge speech recognition system achieving 47% WER improvement in noisy conditions through novel noise-aware attention mechanisms an…

Python Updated Jun 14, 2025
mkl-vc Public
Forked from alobashev/mkl-vc

[Interspeech 2025] Official implementation of "Training-Free Voice Conversion with Factorized Optimal Transport"

Jupyter Notebook Updated Jun 13, 2025
chatterbox Public
Forked from resemble-ai/chatterbox

SoTA open-source TTS

Python MIT License Updated Jun 13, 2025
MonkeyOCR Public
Forked from Yuliang-Liu/MonkeyOCR

A lightweight LMM-based Document Parsing Model

Python Apache License 2.0 Updated Jun 13, 2025
CosyVoice Public
Forked from FunAudioLLM/CosyVoice

LLM based TTS model, providing inference/training/deployment full-stack ability.

Python Apache License 2.0 Updated Jun 13, 2025
stylish-tts Public
Forked from Stylish-TTS/stylish-tts

Python 1 MIT License Updated Jun 13, 2025
LatentSync Public
Forked from bytedance/LatentSync

Taming Stable Diffusion for Lip Sync!

Python Apache License 2.0 Updated Jun 13, 2025
EzAudio Public
Forked from haidog-yaqub/EzAudio

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

Python MIT License Updated Jun 12, 2025
Bert-VITS2 Public
Forked from fishaudio/Bert-VITS2

vits2 backbone with bert

Python GNU Affero General Public License v3.0 Updated Jun 11, 2025
ClearerVoice-Studio Public
Forked from modelscope/ClearerVoice-Studio

ClearVoice

Python 1 Apache License 2.0 Updated Jun 11, 2025
ComfyUI_MegaTTS3 Public
Forked from billwuhao/ComfyUI_MegaTTS3

Lightweight and Efficient, 🎧Ultra High-Quality Voice Cloning, Chinese and English.

Python Apache License 2.0 Updated Jun 11, 2025
MSenC Public
Forked from kimtaesu24/MSenC

[INTERSPEECH'25] Official repository for "Towards Human-like Multimodal Conversational Agent by Generating Engaging Speech"

Python Updated Jun 10, 2025
CMSP-ST Public
Forked from Akito-Go/CMSP-ST

CMSP-ST: Cross-modal Mixup with Speech Purification for End-to-End Speech Translation

Python Apache License 2.0 Updated Jun 10, 2025
mbrs Public
Forked from naist-nlp/mbrs

A library for minimum Bayes risk (MBR) decoding

Python MIT License Updated Jun 10, 2025
ASR-TTS-paper-daily Public
Forked from halsay/ASR-TTS-paper-daily

Update ASR paper everyday

Python Apache License 2.0 Updated Jun 10, 2025
TASTE-SpokenLM Public
Forked from mtkresearch/TASTE-SpokenLM

Python Updated Jun 9, 2025

ishine

Achievements

Achievements

TTS.cpp Public

Uh oh!

finally_based_speech_enhancement Public

Uh oh!

linearvc Public

Uh oh!

VietASR Public

Uh oh!

Stream-Omni Public

Uh oh!

icefall Public

Uh oh!

ArTST Public

Uh oh!

tts_impl Public

Uh oh!

FBK-fairseq1 Public

Uh oh!

SongGeneration Public

Uh oh!

dasheng-glap Public

Uh oh!

TS-ASR-Whisper Public

Uh oh!

X-Codec-2.0 Public

Uh oh!

A-DMA Public

Uh oh!

noise-robust-asr Public

Uh oh!

mkl-vc Public

Uh oh!

chatterbox Public

Uh oh!

MonkeyOCR Public

Uh oh!

CosyVoice Public

Uh oh!

stylish-tts Public

Uh oh!

LatentSync Public

Uh oh!

EzAudio Public

Uh oh!

Bert-VITS2 Public

Uh oh!

ClearerVoice-Studio Public

Uh oh!

ComfyUI_MegaTTS3 Public

Uh oh!

MSenC Public

Uh oh!

CMSP-ST Public

Uh oh!

mbrs Public

Uh oh!

ASR-TTS-paper-daily Public

Uh oh!

TASTE-SpokenLM Public

Uh oh!