8000 para-lost (Jiaxin Ge) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View para-lost's full-sized avatar

Block or report para-lost

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code for the paper "AutoPresent: Designing Structured Visuals From Scratch" (CVPR 2025)

Python 97 7 Updated May 26, 2025

🧩 Official code repository for “Puzzled by Puzzles: When Vision-Language Models Can’t Take a Hint.”

Python 9 Updated Jun 16, 2025
Python 81 7 Updated May 25, 2025
Python 1,212 46 Updated Jun 21, 2025

Numbers every LLM developer should know

4,234 140 Updated Jan 16, 2024

Open-source unified multimodal model

Python 4,246 356 Updated Jun 17, 2025

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

Python 29,434 6,049 Updated Jun 21, 2025

Referring Expression Datasets API

Jupyter Notebook 521 82 Updated Aug 27, 2024

SEED-Story: Multimodal Long Story Generation with Large Language Model

Python 850 69 Updated Oct 11, 2024

Multimodal Models in Real World

Jupyter Notebook 513 21 Updated Feb 24, 2025

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python 2,255 159 Updated Feb 16, 2025

Scrape from Twitter using Nitter instances

Python 230 33 Updated May 24, 2025

Alternative Twitter front-end

Nim 11,109 607 Updated May 1, 2025

Gemma open-weight LLM library, from Google DeepMind

Jupyter Notebook 3,414 470 Updated Jun 20, 2025

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 4,280 276 Updated Jun 4, 2025

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,913 130 Updated Oct 30, 2024

JAX Implementation of Black Forest Labs' Flux.1 family of models

Python 34 2 Updated Oct 20, 2024

Code and data for "Does Spatial Cognition Emerge in Frontier Models?"

Python 16 1 Updated Apr 18, 2025

🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling"

Python 35 3 Updated Jun 16, 2025

Implementation of Diffusion Transformer (DiT) in JAX

Python 278 6 Updated Jun 11, 2024

Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning

Python 191 7 Updated Apr 19, 2025

📚 GPT4o Prompts Dictionary | Curated Collection of AI Image Generation Prompts

HTML 374 24 Updated May 15, 2025

Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"

Python 1,561 123 Updated May 27, 2025

[ICLR 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,482 64 Updated Jun 21, 2025

Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"

Jupyter Notebook 39 1 Updated Jun 2, 2025
Python 509 30 Updated Nov 26, 2024

GPU Accelerated t-SNE for CUDA with Python bindings

Cuda 1,868 134 Updated Oct 2, 2024

NanoGPT (124M) in 3 minutes

Python 2,698 329 Updated Jun 20, 2025
Next
0