8000 likw99's list / MultiModal · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View likw99's full-sized avatar
🤖
🤖

Block or report likw99

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

MultiModal

27 repositories

[ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"

Python 239 12 Updated Dec 30, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,834 172 Updated May 20, 2025

AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highli…

TypeScript 6,423 1,485 Updated May 22, 2025

CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".

Python 311 16 Updated Apr 13, 2024

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 19,465 1,403 Updated May 15, 2025

Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"

Python 1,419 90 Updated May 31, 2023

Emu Series: Generative Multimodal Models from BAAI

Python 1,721 85 Updated Sep 27, 2024

Lumina-T2X is a unified framework for Text to Any Modality Generation

Python 2,189 91 Updated Feb 16, 2025
Python 8,621 505 Updated Oct 9, 2024
TypeScript 283 20 Updated Jun 4, 2024

✨✨[CVPR 2025] Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

554 20 Updated May 8, 2025

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

Python 2,564 205 Updated May 19, 2025

Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥

Python 1,679 132 Updated Jan 14, 2025

Open Source AI Math Notes

Python 486 39 Updated Jun 15, 2024

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Python 2,002 113 Updated Jul 29, 2024

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,906 132 Updated Oct 30, 2024

Medical Multimodal LLMs

Python 292 21 Updated Apr 23, 2025

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 761 44 Updated Aug 5, 2024

🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org

Python 12,606 1,339 Updated May 23, 2025
Jupyter Notebook 1,699 163 Updated Sep 27, 2024
Python 3,850 360 Updated May 6, 2025

Towards Large Multimodal Models as Visual Foundation Agents

Python 212 6 Updated Apr 24, 2025

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Python 384 10 Updated Jul 9, 2024

Next-Token Prediction is All You Need

Python 2,127 80 Updated Mar 17, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,528 1,422 Updated May 22, 2025

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Python 4,836 1,745 Updated Feb 26, 2025
0