-
National Taiwan University of Science and Technology
- Taiwan
- https://xiaosean.github.io/
- in/yong-xiang-lin
- https://medium.com/@xiaosean5408
Highlights
- Pro
Lists (4)
Sort Name ascending (A-Z)
Starred repositories
Ke-Omni-R is an advanced audio reasoning model and achieved SOTA on MMAU
Official implementations for paper: VACE: All-in-One Video Creation and Editing
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
Minimalistic 4D-parallelism distributed training framework for education purpose
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing d…
Paper list and datasets for industrial image anomaly/defect detection (updating). 工业异常/瑕疵检测论文及数据集检索库(持续更新)。
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Eagle Family: Exploring Model Designs, Data Recipes and Training Strategies for Frontier-Class Multimodal LLMs
🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
A lightweight, powerful framework for multi-agent workflows
Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
[ICLR 2025] Official implementation of MotionClone: Training-Free Motion Cloning for Controllable Video Generation
👾 Fast and simple video download library and CLI tool written in Go
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
[ICLR 2025] Official implementation of MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
Fine-Grained Open Domain Image Animation with Motion Guidance
Official repository of In-Context LoRA for Diffusion Transformers
This is the official code repository of the AAAI2025 oral paper "VersaGen: Unleashing Versatile Visual Control for Text-to-Image Synthesis"
A generative world for general-purpose robotics & embodied AI learning.
Official inference repo for FLUX.1 models