-
UC Berkeley
- jiaxin.ge
Lists (1)
Sort Name ascending (A-Z)
Stars
Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)
🧩 Official code repository for “Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint.”
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
SEED-Story: Multimodal Long Story Generation with Large Language Model
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Scrape from Twitter using Nitter instances
Gemma open-weight LLM library, from Google DeepMind
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
JAX Implementation of Black Forest Labs' Flux.1 family of models
Code and data for "Does Spatial Cognition Emerge in Frontier Models?"
🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling"
Implementation of Diffusion Transformer (DiT) in JAX
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
📚 GPT4o Prompts Dictionary | Curated Collection of AI Image Generation Prompts
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"
[ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
GPU Accelerated t-SNE for CUDA with Python bindings