Stars
Notes and links from the book club meetings
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
C++ 资源大全中文版,标准库、Web应用框架、人工智能、数据库、图片处理、机器学习、日志、代码分析等。由「开源前哨」和「CPP开发者」微信公号团队维护更新。
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Code for our INTERSPEECH paper Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
NeMo text processing for ASR and TTS
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Speech, Language, Audio, Music Processing with Large Language Model
🔊 Text-Prompted Generative Audio Model
Grapheme-to-Phoneme for Mixed Chinese (Mandarin or Cantonese) and English.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
[ICASSP 2022] Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection
Forced alignment and Goodness of Pronunciation (GOP) with DNN support. Bases on Kaldi.
Text Normalization & Inverse Text Normalization
Port of OpenAI's Whisper model in C/C++
Faster Whisper transcription with CTranslate2
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Whispering Tiger - OpenAI's whisper (and other models) with OSC and Websocket support. Allowing live transcription / translation in VRChat and Overlays in most Streaming Applications