Stars
This is official repository of new SOTA diffusion models based method for speech enhancement
We introduce the MoE and LoRA structure into knowledge distillation, and improve the effect of knowledge distillation of language models by reducing the gap between thestudent model and the teacher…
liyunlongaaa / pyrirgen
Forked from yoshipon/pyrirgenRoom Impulse Response Generator
liyunlongaaa / d2l-zh
Forked from d2l-ai/d2l-zh《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被55个国家的300所大学用于教学。
**Official** 李宏毅 (Hung-yi Lee) 機器學習 Machine Learning 2022 Spring
liyunlongaaa / ssast
Forked from YuanGongND/ssastCode for the AAAI 2022 paper "SSAST: Self-Supervised Audio Spectrogram Transformer".
liyunlongaaa / ast
Forked from YuanGongND/astCode for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
liyunlongaaa / asv-subtools
Forked from Snowdar/asv-subtoolsAn Open Source Tools for Speaker Recognition
liyunlongaaa / sslsvit
Forked from theolepage/sslsvCollection of self-supervised learning (SSL) methods for speaker verification (SV).
liyunlongaaa / LGLS1
Forked from TaoRuijie/Loss-Gated-LearningICASSP 2022: 'Self-supervised Speaker Recognition with Loss-gated Learning'
liyunlongaaa / exp2
Forked from s3prl/s3prlSelf-Supervised Speech/Sound Pre-training and Representation Learning Toolkit
Some comprehensive papers about speaker diarization
CHIME-7/8 diarization champion system: neural speaker diarization using memory-aware multi-speaker embedding with sequence-to-sequence architecture
Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch
Generative Models by Stability AI
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
IVA-Xception model which can achieve high performance in identifying multiple birds from overlapping bird sounds recordings based on IVA and Xception
ADAPTING SELF-SUPERVISED MODELS TO MULTI-TALKER SPEECH RECOGNITION USING SPEAKER EMBEDDINGS
INTERSPEECH2023: Target Active Speaker Detection with Audio-visual Cues
The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.
Toolkit for training and evaluating Self-Supervised Learning (SSL) frameworks for Speaker Verification (SV).