8000 XinhaoMei / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View XinhaoMei's full-sized avatar

Block or report XinhaoMei

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Efficient Training of Audio Transformers with Patchout

Python 334 51 Updated Jan 12, 2024

A collection of datasets for the purpose of emotion recognition/detection in speech.

HTML 338 44 Updated Sep 30, 2024

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 133 2 Updated Dec 13, 2024
Python 46 2 Updated May 13, 2025

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 265 33 Updated Mar 12, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,273 2,233 Updated Feb 1, 2025

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Python 764 83 Updated Apr 1, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 8,134 615 Updated Apr 27, 2025

Scalable and Performant Data Loading

Python 259 14 Updated May 20, 2025

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,220 694 Updated May 20, 2025

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

Python 33 4 Updated Sep 27, 2024

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs

Python 540 19 Updated Jan 9, 2025

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,726 128 Updated Apr 21, 2025

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,554 87 Updated Sep 27, 2024

Official repository for the paper PLLaVA

Python 649 46 Updated Jul 28, 2024

Utilities intended for use with Llama models.

Python 7,007 1,155 Updated May 7, 2025

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…

Jupyter Notebook 17,330 2,481 Updated May 14, 2025

Gemma open-weight LLM library, from Google DeepMind

Jupyter Notebook 3,280 445 Updated May 19, 2025

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,299 71 Updated Apr 21, 2025

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 860 42 Updated Apr 1, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,405 1,413 Updated May 20, 2025

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 7,944 489 Updated May 18, 2025

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 7,300 649 Updated May 31, 2024

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 8,265 789 Updated Mar 15, 2025

Audio Codec Speech processing Universal PERformance Benchmark

Python 254 25 Updated Apr 14, 2025

Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Python 10 1 Updated May 14, 2025

A simple library for Fréchet Audio Distance (FAD) calculation

Python 205 24 Updated May 19, 2025

Vector (and Scalar) Quantization, in Pytorch

Python 3,247 262 Updated May 3, 2025

A lightweight library for PyTorch training tools and utilities

Python 1,697 285 Updated May 12, 2025

Baseline multi-resolution cross network model trained using the Divide and Remaster Dataset

Python 81 12 Updated Jan 25, 2024
Next
0