8000 XinhaoMei / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View XinhaoMei's full-sized avatar

Block or report XinhaoMei

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
191 results for source starred repositories
Clear filter

Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 146 5 Updated Jun 6, 2025

Efficient Training of Audio Transformers with Patchout

Python 343 52 Updated Jan 12, 2024

A collection of datasets for the purpose of emotion recognition/detection in speech.

HTML 354 47 Updated Sep 30, 2024

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 163 3 Updated Dec 13, 2024
Python 59 7 Updated May 13, 2025

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 283 35 Updated Jun 15, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,466 2,243 Updated Feb 1, 2025

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Python 789 88 Updated Apr 1, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 8,592 663 Updated Jul 16, 2025

Scalable and Performant Data Loading

Python 288 15 Updated Jul 19, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,662 756 Updated Jul 19, 2025

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

Python 33 4 Updated Sep 27, 2024

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs

Python 577 23 Updated Jul 14, 2025

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,807 134 Updated Apr 21, 2025

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,673 100 Updated Sep 27, 2024

Official repository for the paper PLLaVA

Python 660 46 Updated Jul 28, 2024

Utilities intended for use with Llama models.

Python 7,150 1,214 Updated Jul 15, 2025

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…

Jupyter Notebook 977C 17,644 2,554 Updated Jul 18, 2025

Gemma open-weight LLM library, from Google DeepMind

Python 3,532 488 Updated Jul 18, 2025

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,310 72 Updated Apr 21, 2025

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 928 48 Updated Apr 1, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 22,605 1,530 Updated Jun 26, 2025

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…

Jupyter Notebook 8,318 522 Updated May 18, 2025

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 7,583 676 Updated May 31, 2024

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 8,330 792 Updated Mar 15, 2025

Audio Codec Speech processing Universal PERformance Benchmark

Python 263 25 Updated Jul 2, 2025

Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Python 10 1 Updated May 14, 2025

A simple library for Fréchet Audio Distance (FAD) calculation

Python 224 24 Updated May 26, 2025

Vector (and Scalar) Quantization, in Pytorch

Python 3,416 277 Updated Jun 16, 2025

A lightweight library for PyTorch training tools and utilities

Python 1,701 288 Updated Jul 8, 2025
Next
0