8000 shure-dev's list / ⭐ VLM · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View shure-dev's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report shure-dev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

⭐ VLM

24 repositories

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 22,902 2,527 Updated Aug 12, 2024

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Python 5,886 381 Updated Mar 14, 2024

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,598 430 Updated May 29, 2024

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python 3,257 212 Updated Mar 5, 2024

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

Python 25,689 2,939 Updated Sep 2, 2024

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…

Python 1,747 104 Updated Aug 29, 2023

Multimodal-Procedural-Planning

Python 92 3 Updated Jun 1, 2023

LAVIS - A One-stop Library for Language-Vision Intelligence

Jupyter Notebook 10,689 1,045 Updated Nov 18, 2024
133 Updated Dec 22, 2023

Official implementation of HuBo-VLM.

7 Updated Aug 24, 2023

Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.

475 30 Updated Jun 24, 2025

A library for advanced large language model reasoning

Python 2,154 191 Updated Jun 10, 2025

ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without relying on all combinations of paired data.

Python 51 2 Updated Sep 4, 2023

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

Python 709 92 Updated Feb 20, 2025

VisualGPT, CVPR 2022 Proceeding, GPT as a decoder for vision-language models

Python 334 54 Updated May 16, 2023

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 30,046 4,048 Updated Jul 17, 2024

Recent LLM-based CV and related works. Welcome to comment/contribute!

866 36 Updated Mar 8, 2025

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"

Jupyter Notebook 1,698 125 Updated Jan 29, 2024

Official code for VisProg (CVPR 2023 Best Paper!)

Python 731 69 Updated Aug 26, 2024

An open-source framework for training large multimodal models.

Python 3,961 306 Updated Aug 31, 2024

A self-hosted, offline, ChatGPT-like chatbot. Powered by Llama 2. 100% private, with no data leaving your device. New: Code Llama support!

TypeScript 10,984 711 Updated Apr 23, 2024

VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks

Python 385 12 Updated Jul 9, 2024

SAM with text prompt

Python 2,259 260 Updated May 10, 2025

Reading list for research topics in embodied vision

631 77 Updated Jun 13, 2025
0