Stars
Reference-aware automatic speech evaluation toolkit
Awesome music generation model——MG²
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Noise supression using deep filtering
Convenient for developers to call inference models from version v1 to v3 through API, supporting streaming transmis 8000 sion and specified type file transfer.
text to speech using autoregressive transformer and VITS
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense f…
A feature-rich command-line audio/video downloader
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Monitoring recent cross-research on LLM & RL on arXiv for control. If there are good papers, PRs are welcome.
800,000 step-level correctness labels on LLM solutions to MATH problems
open-o1: Using GPT-4o with CoT to Create o1-like Reasoning Chains
[ICLR 2023] ReAct: Synergizing Reasoning and Acting in Language Models
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
CoNLI: a plug-and-play framework for ungrounded hallucination detection and reduction
Official implementation of “GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting” by Kyusun Cho, Joungbin Lee, Heeji Yoon, Yeobin Hong, Jaehoon Ko,…
rewrite the deep retrieval using pytorch
🟣 Recommendation Systems interview questions and answers to help you prepare for your next machine learning and data science interview in 2025.
Welcome to the LLMs Interview Prep Guide! This GitHub repository offers a curated set of interview questions and answers tailored for Data Scientists. Enhance your understanding of Large Language M…
Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Summarise YouTube videos and save time!
ML-based Rainfall Estimator is a machine learning-based tool to estimate the rainfall in the areas in which no rain gauge data is available.