-
UC Berkeley
- jiaxin.ge
Lists (1)
Sort Name ascending (A-Z)
Stars
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
A Twitter client for agents-- no API key necessary
Scrape from Twitter using Nitter instances
Gemma open-weight LLM library, from Google DeepMind
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
JAX Implementation of Black Forest Labs' Flux.1 family of models
Code and data for "Does Spatial Cognition Emerge in Frontier Models?"
🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling"
Implementation of Diffusion Transformer (DiT) in JAX
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
📚 GPT4o Prompts Dictionary | Curated Collection of AI Image Generation Prompts
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"
GPU Accelerated t-SNE for CUDA with Python bindings
Minimal reproduction of DeepSeek R1-Zero
A fork to add multimodal model training to open-r1
Solve Visual Understanding with Reinforced VLMs
Witness the aha moment of VLM with less than $3.
verl: Volcano Engine Reinforcement Learning for LLMs
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.