Stars
Command line utility for forced alignment using Kaldi
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
A curated list of awesome papers on contextualizing E2E ASR outputs
Codes and datasets for our ICASSP2023 paper, Evaluating parameter-efficient transfer learning approaches on SURE benchmark for speech understanding
This Repository surveys the paper focusing on Prompting and Adapters for Speech Processing.
TigerBot: A multi-language multi-task LLM
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
【LLMs九层妖塔】分享 LLMs在自然语言处理(ChatGLM、Chinese-LLaMA-Alpaca、小羊驼 Vicuna、LLaMA、GPT4ALL等)、信息检索(langchain)、语言合成、语言识别、多模态等领域(Stable Diffusion、MiniGPT-4、VisualGLM-6B、Ziya-Visual等)等 实战与经验。
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
LLM training code for Databricks foundation models
⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡
中文nlp解决方案(大模型、数据、模型、训练、推理)
A self-supervised learning framework for audio-visual speech
Deep-Learning-Based Audio-Visual Speech Enhancement and Separation
SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.
Attempt at tracking states of the arts and recent results (bibliography) on speech recognition.
Repository for the paper "Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning"
MPNet: Masked and Permuted Pre-training for Language Understanding https://arxiv.org/pdf/2004.09297.pdf
XLNet: Generalized Autoregressive Pretraining for Language Understanding
End-to-end ASR/LM implementation with PyTorch
Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
Python interface to the WebRTC Voice Activity Detector
This project is real-time visualization of a network recognizing digits from user's input.
DenseNet3D Model In "LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild", https://arxiv.org/abs/1810.06990
The proposed method in LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild