Stars
Muzic: Music Understanding and Generation with Artificial Intelligence
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
chinese speech pretrained models
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Keyword spotting on Arm Cortex-M Microcontrollers
Production First and Production Ready End-to-End Keyword Spotting Toolkit
Infrastructure to enable deployment of ML models to low-power resource-constrained embedded targets (including microcontrollers and digital signal processors).
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
A PyTorch implementation of Conv-TasNet described in "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" with Permutation Invariant Training (PIT).
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation Pytorch's Implement
Pytorch!!!Pytorch!!!Pytorch!!! Dynamic Convolution: Attention over Convolution Kernels (CVPR-2020)
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
Source code for publication: "Spectrum Correction: Acoustic Scene Classification with Mismatched Recording Devices"
PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,simsiam, SwAV, BEiT,MAE 等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法
Implementation of paper "DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement"
This repository is a curated list of awesome Speech Keyword Spotting (Wake-Up Word Detection).
Densely Connected Convolutional Networks, In CVPR 2017 (Best Paper Award).
An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech 7D71 "
The Implementation of FastSpeech based on pytorch.
An Open Source Tools for Speaker Recognition
Faster and elegant TensorFlow Implementation of paper: Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Production First and Production Ready End-to-End Speech Recognition Toolkit
An implementation of deep-voice-conversion using pytorch
Voice conversion model for real-time speech synthesis using PPG (Phonetic PosteriorGram) as an intermediate feature, written in Pytorch.
Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams (Interspeech'19)