8000 likw99's list / MultiModal · GitHub

More Web Proxy on the site http://driver.im/

likw99

Follow

🤖

K likw99

🤖

Follow

Let's make things happen.

37 followers · 294 following

Achievements

Achievements

Stars

MultiModal

27 repositories

zamling / PSALM

[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"

Python 239 12 Updated Dec 30, 2024

InternLM / InternLM-XComposer

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,834 172 Updated May 20, 2025

enricoros / big-AGI

AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highli…

TypeScript 6,423 1,485 Updated May 22, 2025

sail-sg / CLoT

CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".

Python 311 16 Updated Apr 13, 2024

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19,465 1,403 Updated May 15, 2025

thu-ml / unidiffuser

Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"

Python 1,419 90 Updated May 31, 2023

baaivision / Emu

Emu Series: Generative Multimodal Models from BAAI

Python 1,721 85 Updated Sep 27, 2024

Alpha-VLLM / Lumina-T2X

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,189 91 Updated Feb 16, 2025

apple / ml-ferret

Python 8,621 505 Updated Oct 9, 2024

siddrrsh / ambientGPT

TypeScript 283 20 Updated Jun 4, 2024

MME-Benchmarks / Video-MME

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

554 20 Updated May 8, 2025

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python 2,564 205 Updated May 19, 2025

roboflow / awesome-openai-vision-api-experiments

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Python 1,679 132 Updated Jan 14, 2025

ayushpai / AI-Math-Notes

Open Source AI Math Notes

Python 486 39 Updated Jun 15, 2024

facebookresearch / chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 2,002 113 Updated Jul 29, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,906 132 Updated Oct 30, 2024

FreedomIntelligence / HuatuoGPT-Vision

Medical Multimodal LLMs

Python 292 21 Updated Apr 23, 2025

GAIR-NLP / anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 761 44 Updated Aug 5, 2024

camel-ai / camel

🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org

Python 12,606 1,339 Updated May 23, 2025

microsoft / i-Code

Jupyter Notebook 1,699 163 Updated Sep 27, 2024

LLaVA-VL / LLaVA-NeXT

Python 3,850 360 Updated May 6, 2025

THUDM / VisualAgentBench

Towards Large Multimodal Models as Visual Foundation Agents

Python 212 6 Updated Apr 24, 2025

Meituan-AutoML / VisionLLaMA

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Python 384 10 Updated Jul 9, 2024

baaivision / Emu3

Next-Token Prediction is All You Need

Python 2,127 80 Updated Mar 17, 2025

QwenLM / Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,528 1,422 Updated May 22, 2025

MoonshotAI / Kimi-k1.5

3,342 217 Updated Mar 7, 2025

deepseek-ai / DeepSeek-VL2

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 4,836 1,745 Updated Feb 26, 2025

0