8000 para-lost (Jiaxin Ge) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View para-lost's full-sized avatar

Block or report para-lost

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Multimodal Models in Real World

Jupyter Notebook 500 21 Updated Feb 24, 2025

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Python 2,185 153 Updated Feb 16, 2025

A Twitter client for agents-- no API key necessary

TypeScript 1,627 401 Updated Mar 10, 2025

Scrape from Twitter using Nitter instances

Python 218 33 Updated Apr 6, 2025

Alternative Twitt 8000 er front-end

Nim 10,872 584 Updated May 1, 2025

Gemma open-weight LLM library, from Google DeepMind

Jupyter Notebook 3,232 438 Updated May 1, 2025

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 4,082 262 Updated Apr 21, 2025

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,898 129 Updated Oct 30, 2024

JAX Implementation of Black Forest Labs' Flux.1 family of models

Python 31 2 Updated Oct 20, 2024

Code and data for "Does Spatial Cognition Emerge in Frontier Models?"

Python 13 Updated Apr 18, 2025

🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resampling"

Python 26 4 Updated Apr 18, 2025

Implementation of Diffusion Transformer (DiT) in JAX

Python 272 6 Updated Jun 11, 2024

Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning

Python 154 5 Updated Apr 19, 2025

📚 GPT4o Prompts Dictionary | Curated Collection of AI Image Generation Prompts

HTML 262 20 Updated Apr 17, 2025

Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"

Python 1,426 111 Updated Apr 14, 2025

[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,375 58 Updated Apr 28, 2025

Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"

Jupyter Notebook 38 1 Updated Mar 16, 2025
Python 488 29 Updated Nov 26, 2024

GPU Accelerated t-SNE for CUDA with Python bindings

Cuda 1,857 134 Updated Oct 2, 2024

NanoGPT (124M) in 3 minutes

Python 2,522 293 Updated Apr 26, 2025

LLM training in simple, raw C/CUDA

Cuda 26,505 3,047 Updated May 1, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 11,700 1,480 Updated Apr 24, 2025

s1: Simple test-time scaling

Python 6,332 743 Updated Apr 4, 2025

A fork to add multimodal model training to open-r1

Python 1,245 60 Updated Feb 8, 2025

Solve Visual Understanding with Reinforced VLMs

Python 4,860 303 Updated Apr 21, 2025

Witness the aha moment of VLM with less than $3.

Python 3,623 285 Updated Mar 1, 2025
JavaScript 3,320 455 Updated Apr 26, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 7,564 843 Updated May 5, 2025

Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 10,189 724 Updated May 4, 2025
Next
0