- Seoul,Korea
-
14:09
(UTC +09:00) - https://velog.io/@jhko
Highlights
- Pro
Lists (13)
Sort Name ascending (A-Z)
Starred repositories
iSTFTNet : Fast and Li 10000 ghtweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform
Elucidated Text-To-Audio (ETTA) is a SOTA text-to-audio model with a holistic understanding of the design space and trained with synthetic captions.
[ICML 2025 Spotlight] Direct Discriminative Optimization: Supercharging Diffusion/Autoregressive with GAN-type Discrimination
Try to replicate the architecture of MiniMaxTTS mentioned in it's technical report
An open source graphics editor for 2025: comprehensive 2D content creation tool suite for graphic design, digital art, and interactive real-time motion graphics — featuring node-based procedural ed…
An open-source AI agent that brings the power of Gemini directly into your terminal.
Compositional Differentiable Programming Library
Self-Supervised Speech Pre-training and Representation Learning Toolkit
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
A plugin that does one thing only: Detect and manage duplicate items in Zotero.
We introduce VLM-Mamba, the first Vision-Language Model built entirely on State Space Models (SSMs), specifically leveraging the Mamba architecture.
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
OmniGen2: Exploration to Advanced Multimodal Generation.
A curated list of awesome papers on contextualizing E2E ASR outputs
Daily tracking of awesome audio papers, including music generation, zero-shot tts, asr, audio generation
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
The fastest Fourier transform in the Rhein (so far). Pure Nim.
Korean Streaming ASR(with Denoiser and Conformer CTC)
Explorations into NEAT and some of its derivative research
Official repo of INTERSPEECH 2024 paper Genhancer: High-Fidelity Speech Enhancement via Generative Modeling on Discrete Codec Tokens. This repo provides additional audio samples.
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
Different implementations of "Weighted Prediction Error" for speech dereverberation
A tool for creating and running Linux containers using lightweight virtual machines on a Mac. It is written in Swift, and optimized for Apple silicon.
Containerization is a Swift package for running Linux containers on macOS.