Stars
AI Audio Datasets (AI-ADS) 🎵, including Speech, Music, and Sound Effects, which can provide training data for Generative AI, AIGC, AI model training, intelligent audio tool development, and audio a…
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Code repository of the paper "CKConv: Continuous Kernel Convolution For Sequential Data" published at ICLR 2022. https://arxiv.org/abs/2102.02611
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
TF/Keras code for DiffStride, a pooling layer with learnable strides.
A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
LEAF is a learnable alternative to audio features such as mel-filterbanks, that can be initialized as an approximation of mel-filterbanks, and then be trained for the task at hand, while using a ve…
Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models
Fast PyTorch based DSP for audio and 1D signals
Analyze and manipulate EEG data using PyEEGLab.
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Pytorch implementation of 2D Discrete Wavelet (DWT) and Dual Tree Complex Wavelet Transforms (DTCWT) and a DTCWT based ScatterNet
Wavelet scattering transforms in Python with GPU acceleration
Gammatone-based spectrograms, using gammatone filterbanks or Fourier transform weightings.
ABNet is a "same/different"-based loss trained neural net.