Highlights
- Pro
-
Harmon Public
Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
-
-
wusize.github.io Public
Forked from academicpages/academicpages.github.ioGithub Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
-
-
WISE Public
Forked from PKU-YuanGroup/WISEWISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
Python UpdatedApr 3, 2025 -
Show-o Public
Forked from showlab/Show-o[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
Python Apache License 2.0 UpdatedFeb 28, 2025 -
Janus Public
Forked from deepseek-ai/JanusJanus-Series: Unified Multimodal Understanding and Generation Models
Python MIT License UpdatedFeb 1, 2025 -
lmms-eval Public
Forked from EvolvingLMMs-Lab/lmms-evalAccelerating the development of large multimodal models (LMMs) with lmms-eval
Python UpdatedJan 26, 2025 -
Open-MAGVIT2 Public
Forked from vinyesm/Open-MAGVIT2A packaging of Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Python Apache License 2.0 UpdatedOct 6, 2024 -
RADIO Public
Forked from NVlabs/RADIOOfficial repository for "AM-RADIO: Reduce All Domains Into One"
-
F-LMM Public
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
-
OMG-Seg Public
Forked from lxtGH/OMG-SegOMG-LLaVA and OMG-Seg codebase
-
chameleon Public
Forked from facebookresearch/chameleonRepository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Python Other UpdatedJul 3, 2024 -
MMT-Bench Public
Forked from OpenGVLab/MMT-BenchICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Python UpdatedJun 18, 2024 -
Visual-CoT Public
Forked from deepcs233/Visual-CoTVisual CoT: Unleashing Chain-of-Thought Reasoning in the Multi-Modal Language Model
Python Apache License 2.0 UpdatedMay 2, 2024 -
DeepSpeed Public
Forked from deepspeedai/DeepSpeedDeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Python Apache License 2.0 UpdatedMar 25, 2024 -
CLIPSelf Public
[ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
-
CLIM Public
[AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation
-
LLaVA-Grounding Public
Forked from UX-Decoder/LLaVA-GroundingPython Apache License 2.0 UpdatedJan 22, 2024 -
UNINEXT Public
Forked from MasterBin-IIAU/UNINEXT[CVPR'23] Universal Instance Perception as Object Discovery and Retrieval
Python MIT License UpdatedNov 9, 2023 -
-
ovdet Public
[CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection
-
-
CAT-Seg Public
Forked from cvlab-kaist/CAT-SegOfficial Implementation of "CAT-Seg🐱: Cost Aggregation for Open-Vocabulary Semantic Segmentation"
Python UpdatedSep 11, 2023 -
-
RegionCLIP Public
Forked from microsoft/RegionCLIP[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
Python Apache License 2.0 UpdatedAug 13, 2023 -
SAN Public
Forked from MendelXu/SANOpen-vocabulary Semantic Segmentation
Python MIT License UpdatedMay 9, 2023 -
open_clip-1 Public
Forked from mlfoundations/open_clipAn open source implementation of CLIP.
Python Other UpdatedApr 23, 2023 -
multiview_pose Public
[ICCV2021] Code Release of Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
-
colorization Public
This is the code of the colorization project of the National Innovation Program.