8000 sinhprous (Bao-Sinh Nguyen) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View sinhprous's full-sized avatar
  • Stealth Web 3 startup
  • Remote

Block or report sinhprous

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching

Jupyter Notebook 29 5 Updated Feb 9, 2025

CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms

21 Updated May 23, 2025

[INTERSPEECH'2025] Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset

1 Updated May 23, 2025

ffn - a financial function library for Python

Python 2,264 331 Updated Apr 1, 2025

Find your trading edge, using the fastest engine for backtesting, algorithmic trading, and research.

Python 5,242 710 Updated May 12, 2025

Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.

Python 4,594 525 Updated Mar 11, 2025

A high-performance algorithmic trading platform and event-driven backtester

Python 5,984 856 Updated May 23, 2025

A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation

Jupyter Notebook 28 4 Updated May 21, 2025

Based on Official code of "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching". This work uses phoneme-level forced alignment to stabilize the generation process.

Jupyter Notebook 1 Updated Jan 3, 2025

Based on Official code of "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching". This work uses phoneme-level forced alignment to stabilize the generation process.

Jupyter Notebook 2 Updated Jan 3, 2025
Python 784 38 Updated May 22, 2025

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

Python 568 27 Updated May 15, 2025

A song aesthetic evaluation toolkit trained on SongEval.

Python 147 9 Updated May 23, 2025

Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report

31 Updated May 14, 2025

[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer

Python 58 2 Updated Nov 1, 2024

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 3,770 190 Updated May 5, 2025

Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining

8 Updated May 10, 2025

Collection of Open Source Speech Data

157 6 Updated Nov 8, 2024

TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation

23 Updated May 9, 2025

Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"

Python 76 6 Updated May 15, 2025

prime-rl is a codebase for decentralized RL training at scale

Python 284 23 Updated May 23, 2025

Implementation of all RAG techniques in a simpler way

Jupyter Notebook 1,797 275 Updated May 12, 2025

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python 1,026 132 Updated Sep 5, 2024

Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

Python 113 1 Updated May 20, 2025

DreamO: A Unified Framework for Image Customization

Python 1,291 85 Updated May 13, 2025

Data manipulation and transformation for audio signal processing, powered by PyTorch

Python 8 Updated Sep 30, 2024

DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.

TypeScript 11,605 1,154 Updated May 22, 2025

A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline

Python 133 2 Updated Dec 13, 2024

Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"

Python 2,740 232 Updated May 7, 2025
Next
0