Stars
Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation
High fidelity, lightweight, end-to-end, streaming, convolution-based neural audio codec
The implementation for "Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System".
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
FACodec: Speech Codec with Attribute Factorization used for NaturalSpeech 3
Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR
Deep Learning Autonomous Car based on Raspberry Pi, SunFounder PiCar-V Kit, TensorFlow, and Google's EdgeTPU Co-Processor
luiszeni / yolact_onnx
Forked from dbolya/yolactA simple, fully convolutional model for real-time instance segmentation.
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translatio…
A custom micropython firmware integrating tensorflow lite for microcontrollers and ulab to implement the tensorflow micro examples.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Data manipulation and transformation for audio signal processing, powered by PyTorch
ASR/NLP/TTS deep learning inference library for NVIDIA Jetson using PyTorch and TensorRT
Simple Python package for breaking Russian words into syllables
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
This repository contains the scripts for the models of deep unsupervised learning of vocal entrainment
License plate recognition . Model training and conversion to tflite
Распознавание речи русского языка используя Tensorflow, обучаясь на базе Voxforge
End-to-end speech to text recognition
Speech recognition dataset based on russian audiobook, sentance-level split