Stars
pyCrossfade is the result of a personal project to use beat matching, gradual bpm shift on bars, and EQ modification to provide smooth and tunable transitions between music files.
Agentless🐱: an agentless approach to automatically solve software development problems
Janus-Series: Unified Multimodal Understanding and Generation Models
We study toy models of skill learning.
LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
This repository contains LLM (Large language model) interview question asked in top companies like Google, Nvidia , Meta , Microsoft & fortune 500 companies.
Run Streamlit Apps as serverless on AWS with HTTPS
Simple Python library/structure to ablate features in LLMs which are supported by TransformerLens
The Universe of Evaluation. All about the evaluation for LLMs.
Universal Romanizer that can convert any unicode script to roman (latin) script
Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis.
Official PyTorch implementation of BigVGAN (ICLR 2023)
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
Train high-quality text-to-image diffusion models in a data & compute efficient manner
Let us democratise high-resolution generation! (CVPR 2024)
All generative model in one for better TTS model
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
OneDiff: An out-of-the-box acceleration library for diffusion models.
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
A library for mechanistic interpretability of GPT-style language models