Starred repositories
Efficient Triton Kernels for LLM Training
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
DSPy: The framework for programming—not prompting—language models
Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
Web Interface for Vision Language Models Including InternVLM2
Toolkit for linearizing PDFs for LLM datasets/training
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
ESC-50: Dataset for Environmental Sound Classification
Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning
Famous Vision Language Models and Their Architectures
Agent Laboratory is an end-to-end autonomous research workflow meant to assist you as the human researcher toward implementing your research ideas
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Research and Production Oriented Speaker Verification, Recognition and Diarization Toolkit
DeepFaceLab is the leading software for creating deepfakes.
Clustering-based methods for overlapping diarization
A specializer for Gaussian Mixture Models, based on the ASP framework
Python3 code for the IEEE SPL paper "Auto-Tuning Spectral Clustering for SpeakerDiarization Using Normalized Maximum Eigengap"
This repo is for the SPL paper "Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap"
Tools for merging pretrained large language models.
Robust Speech Recognition via Large-Scale Weak Supervision
How to use OpenAIs Whisper to transcribe and diarize audio files
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
Python re-implementation of the (constrained) spectral clustering algorithms used in Google's speaker diarization papers.
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization