XinhaoMei

XinhaoMei

PhD student @ Uni of Surrey

52 followers · 28 following

CVSSP @ University of Surrey
Guildford
xinhaomei.github.io

Achievements

8000

Stars

191 results for source starred repositories

Clear filter

ddlBoJack / MMAR

Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 146 5 Updated Jun 6, 2025

kkoutini / PaSST

Efficient Training of Audio Transformers with Patchout

Python 343 52 Updated Jan 12, 2024

SuperKogito / SER-datasets

A collection of datasets for the purpose of emotion recognition/detection in speech.

HTML 354 47 Updated Sep 30, 2024

JishengBai / AudioSetCaps

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 163 3 Updated Dec 13, 2024

fschmid56 / PretrainedSED

Python 59 7 Updated May 13, 2025

zhenye234 / X-Codec-2.0

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 283 35 Updated Jun 15, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,466 2,243 Updated Feb 1, 2025

facebookresearch / SONAR

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Python 789 88 Updated Apr 1, 2025

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 8,592 663 Updated Jul 16, 2025

facebookresearch / spdl

Scalable and Performant Data Loading

Python 288 15 Updated Jul 19, 2025

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,662 756 Updated Jul 19, 2025

merlresearch / tssep

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

Python 33 4 Updated Sep 27, 2024

lucidrains / BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs

Python 577 23 Updated Jul 14, 2025

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,807 134 Updated Apr 21, 2025

LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,673 100 Updated Sep 27, 2024

magic-research / PLLaVA

Official repository for the paper PLLaVA

Python 660 46 Updated Jul 28, 2024

meta-llama / llama-models

Utilities intended for use with Llama models.

Python 7,150 1,214 Updated Jul 15, 2025

meta-llama / llama-cookbook

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…

Jupyter Notebook 17,644 2,554 Updated Jul 18, 2025

google-deepmind / gemma

Gemma open-weight LLM library, from Google DeepMind

Python 3,532 488 Updated Jul 18, 2025

facebookresearch / MobileLLM

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,310 72 Updated Apr 21, 2025

buoyancy99 / diffusion-forcing

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 928 48 Updated Apr 1, 2025

QwenLM / Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 22,605 1,530 Updated Jun 26, 2025

FoundationVision / VAR

[NeurIPS 2024 Best Paper Award][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". A…

Jupyter Notebook 8,318 522 Updated May 18, 2025

facebookresearch / DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 7,583 676 Updated May 31, 2024

jasonppy / VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 8,330 792 Updated Mar 15, 2025

voidful / Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark

Python 263 25 Updated Jul 2, 2025

WangHelin1997 / Aty-TTS

Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Python 10 1 Updated May 14, 2025

microsoft / fadtk

A simple library for Fréchet Audio Distance (FAD) calculation

Python 224 24 Updated May 26, 2025

lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Python 3,416 277 Updated Jun 16, 2025

pytorch / tnt

A lightweight library for PyTorch training tools and utilities

Python 1,701 288 Updated Jul 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XinhaoMei

Achievements

Achievements

Block or report XinhaoMei

Stars

ddlBoJack / MMAR

kkoutini / PaSST

SuperKogito / SER-datasets

JishengBai / AudioSetCaps

fschmid56 / PretrainedSED

zhenye234 / X-Codec-2.0

deepseek-ai / Janus

facebookresearch / SONAR

OpenGVLab / InternVL

facebookresearch / spdl

kyutai-labs / moshi

merlresearch / tssep

lucidrains / BS-RoFormer

QwenLM / Qwen2-Audio

LTH14 / mar

magic-research / PLLaVA

meta-llama / llama-models

meta-llama / llama-cookbook

google-deepmind / gemma

facebookresearch / MobileLLM

buoyancy99 / diffusion-forcing

QwenLM / Qwen3

FoundationVision / VAR

facebookresearch / DiT

jasonppy / VoiceCraft

voidful / Codec-SUPERB

WangHelin1997 / Aty-TTS

microsoft / fadtk

lucidrains / vector-quantize-pytorch

pytorch / tnt