-
School of Software
- Tsinghua University
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Fully open reproduction of DeepSeek-R1
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[ICASSP 2024] The official repo for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection
Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
Command-line program to download videos from YouTube.com and other video sites
Hate-CLIPper: Multimodal Hateful Meme Classification with Explicit Cross-modal Interaction of CLIP features - Accepted at EMNLP 2022 Workshop
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Memes Processing Pipeline that enables the track of memes across multiple Web communities.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
✨✨Latest Advances on Multimodal Large Language Models
"他山之石、可以攻玉":复旦白泽智能发布面向国内开源和国外商用大模型的Demo数据集JADE-DB
Stable Diffusion web UI
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge. https://arxiv.org/abs/2012.12975