Stars
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual information for complex reasoning, planning, and generation.
Tool for n-gram overlap analysis between test and training sequences
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Multimodal Large Language Models for Code Generation under Multimodal Scenarios
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
Witness the aha moment of VLM with less than $3.
Minimal reproduction of DeepSeek R1-Zero
Columbia Robot Studio Project Implementation: Code, CAD, 3mf, etc
Get your documents ready for gen AI
List of Computer Science courses with video lectures.
A curated list of foundation models for vision and language tasks
A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.
Pytorch implementation of Twelve Labs' Video Foundation Model evaluation framework & open embeddings
Optimized primitives for collective multi-GPU communication
Examples and guides for using the OpenAI API
Paper collections of the continuous effort start from World Models.
A curated list of awesome self-supervised learning methods in videos
llama3 implementation one matrix multiplication at a time
Pandora: Towards General World Model with Natural Language Actions and Video States
Code for FLAVR: A fast and efficient frame interpolation technique.
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
This is a list of awesome prototype-based papers for explainable artificial intelligence.
Tracking and collecting papers/projects/others related to Segment Anything.
Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.