-
Stealth Web 3 startup
- Remote
Stars
[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms
[INTERSPEECH'2025] Towards Emotionally Consistent Text-Based Speech Editing: Introducing EmoCorrector and The ECD-TSE Dataset
ffn - a financial function library for Python
Find your trading edge, using the fastest engine for backtesting, algorithmic trading, and research.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
A high-performance algorithmic trading platform and event-driven backtester
A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
entn-at / F5-TTS_phoneme
Forked from sinhprous/F5-TTSBased on Official code of "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching". This work uses phoneme-level forced alignment to stabilize the generation process.
DevMacsAnalytics / F5-TTS_phoneme
Forked from sinhprous/F5-TTSBased on Official code of "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching". This work uses phoneme-level forced alignment to stabilize the generation process.
JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
A song aesthetic evaluation toolkit trained on SongEval.
Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report
[ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025
Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
prime-rl is a codebase for decentralized RL training at scale
Implementation of all RAG techniques in a simpler way
Official PyTorch implementation of BigVGAN (ICLR 2023)
Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
DreamO: A Unified Framework for Image Customization
huangruizhe / audio
Forked from pytorch/audioData manipulation and transformation for audio signal processing, powered by PyTorch
DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"