Stars
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
VisualWebArena is a benchmark for multimodal agents.
Public release for "Distillation and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections"
This repo contains documentation and code needed to use PACO dataset: data loaders and training and evaluation scripts for objects, parts, and attributes prediction models, query evaluation scripts…
A repo lists papers related to LLM based agent
Must-read Papers on Large Language Model (LLM) Planning.
[ICCV'23] LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models
Jupyter book for Biologically Intelligent eXploration
Tools to simulate biological exploration.
Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"
Machine Learning Utils of Sinzlab
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings. NeurIPS 2022
Code to analyze V1 data from Mitchell lab