XinhaoMei

XinhaoMei

PhD student @ Uni of Surrey

52 followers · 28 following

CVSSP @ University of Surrey
Guildford
xinhaomei.github.io

Achievements

Stars

kkoutini / PaSST

Efficient Training of Audio Transformers with Patchout

Python 334 51 Updated Jan 12, 2024

SuperKogito / SER-datasets

A collection of datasets for the purpose of emotion recognition/detection in speech.

HTML 338 44 Updated Sep 30, 2024

JishengBai / AudioSetCaps

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 133 2 Updated Dec 13, 2024

fschmid56 / PretrainedSED

Python 46 2 Updated May 13, 2025

zhenye234 / X-Codec-2.0

Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 265 33 Updated Mar 12, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,273 2,233 Updated Feb 1, 2025

facebookresearch / SONAR

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.

Python 764 83 Updated Apr 1, 2025

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 8,134 615 Updated Apr 27, 2025

facebookresearch / spdl

Scalable and Performant Data Loading

Python 259 14 Updated May 20, 2025

kyutai-labs / moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,220 694 Updated May 20, 2025

merlresearch / tssep

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

Python 33 4 Updated Sep 27, 2024

lucidrains / BS-RoFormer

Implementation of Band Split Roformer, SOTA Attention network for music source separation out of ByteDance AI Labs

Python 540 19 Updated Jan 9, 2025

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,726 128 Updated Apr 21, 2025

LTH14 / mar

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,554 87 Updated Sep 27, 2024

magic-research / PLLaVA

Official repository for the paper PLLaVA

Python 649 46 Updated Jul 28, 2024

meta-llama / llama-models

Utilities intended for use with Llama models.

Python 7,007 1,155 Updated May 7, 2025

meta-llama / llama-cookbook

Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama mode…

Jupyter Notebook 17,330 2,481 Updated May 14, 2025

google-deepmind / gemma

Gemma open-weight LLM library, from Google DeepMind

Jupyter Notebook 3,280 445 Updated May 19, 2025

facebookresearch / MobileLLM

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 1,299 71 Updated Apr 21, 2025

buoyancy99 / diffusion-forcing

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 860 42 Updated Apr 1, 2025

QwenLM / Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,405 1,413 Updated May 20, 2025

FoundationVision / VAR

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 7,944 489 Updated May 18, 2025

facebookresearch / DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Python 7,300 649 Updated May 31, 2024

jasonppy / VoiceCraft

Zero-Shot Speech Editing and Text-to-Speech in the Wild

Jupyter Notebook 8,265 789 Updated Mar 15, 2025

voidful / Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark

Python 254 25 Updated Apr 14, 2025

WangHelin1997 / Aty-TTS

Aty-TTS: Improving fairness for spoken language understanding in atypical speech with Text-to-Speech

Python 10 1 Updated May 14, 2025

microsoft / fadtk

A simple library for Fréchet Audio Distance (FAD) calculation

Python 205 24 Updated May 19, 2025

lucidrains / vector-quantize-pytorch

Vector (and Scalar) Quantization, in Pytorch

Python 3,247 262 Updated May 3, 2025

pytorch / tnt

A lightweight library for PyTorch training tools and utilities

Python 1,697 285 Updated May 12, 2025

merlresearch / cocktail-fork-separation

Baseline multi-resolution cross network model trained using the Divide and Remaster Dataset

Python 81 12 Updated Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XinhaoMei

Achievements

Achievements

Block or report XinhaoMei

Stars

kkoutini / PaSST

SuperKogito / SER-datasets

JishengBai / AudioSetCaps

fschmid56 / PretrainedSED

zhenye234 / X-Codec-2.0

deepseek-ai / Janus

facebookresearch / SONAR

OpenGVLab / InternVL

facebookresearch / spdl

kyutai-labs / moshi

merlresearch / tssep

lucidrains / BS-RoFormer

QwenLM / Qwen2-Audio

LTH14 / mar

magic-research / PLLaVA

meta-llama / llama-models

meta-llama / llama-cookbook

google-deepmind / gemma

facebookresearch / MobileLLM

buoyancy99 / diffusion-forcing

QwenLM / Qwen3

FoundationVision / VAR

facebookresearch / DiT

jasonppy / VoiceCraft

voidful / Codec-SUPERB

WangHelin1997 / Aty-TTS

microsoft / fadtk

lucidrains / vector-quantize-pytorch

pytorch / tnt

merlresearch / cocktail-fork-separation