8000 GitHub - FoundationVision/Liquid: Liquid: Language Models are Scalable and Unified Multi-modal Generators
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

FoundationVision/Liquid

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Liquid: Language Models are Scalable and Unified
Multi-modal Generators

Junfeng Wu1,2 · Yi Jiang2† · Chuofan Ma2,3
Yuliang Liu1 · Hengshuang Zhao3
Zehuan Yuan2 · Song Bai2* · Xiang Bai1*

1HUST   2ByteDance   3HKU
†project lead   *corresponding author

Paper PDF Project Page

This repo implements Liquid, a scalable and unified autoregressive generation paradigm that seamlessly integrates multimodal comprehension and generation.

teaser

📰 News

2025-03-25: Data processing and model pretraining scripts have been updated in Data.md and TRAIN.md.

2025-03-04: Text-to-image and visual understanding evaluation scripts for Liquid are released in EVAL.md.

2025-02-28: Paper, demo, model, and project page for Liquid are all released.

📑 Open-Source Plan

  • Liquid-7B-IT (Instruction Tuned Multimodal Model with Instruction Following Ability)
    • [✅] Web Demo
    • [✅] Evaluation
    • [✅] Checkpoints
    • [✅] Training Codes
  • Liquid-0.5B~32B-Pretrain (Multimodal extension models of six different scales ranging from 0.5B to 32B across three model families. )
    • Checkpoints

📽️Inference

Using Liquid for inference or evaluation doesn't require complex environment dependencies. Since it's essentially a HuggingFace format language model, you only need the transformers library and some basic components to run it. Refer to EVAL.md for recommended versions.

Run the Gradio Demo locally

If deploying on a GPU with less than 30GB VRAM, you may need to enable load_in_8bit in AutoModelForCausalLM.from_pretrained in app.py for image generation to avoid out-of-memory errors.

pip install gradio==4.44.1
pip install gradio_client==1.3.0

cd evaluation
python app.py

Single inference

# Engage in pure language dialogue.

python inference_t2t.py  --model_path Junfeng5/Liquid_V1_7B  --prompt  "Write me a poem about Machine Learning."


# image understanding
python inference_i2t.py --model_path Junfeng5/Liquid_V1_7B  --image_path samples/baklava.png   --prompt 'How to make this pastry?'


# image generation, add --load_8bit for GPU with less than 30GB VRAM
python inference_t2i.py   --model_path Junfeng5/Liquid_V1_7B --prompt "young blue dragon with horn lightning in the style of dd fantasy full body"  

⚙️ Installation and Training

See Data.md and TRAIN.md.

📖 Introduction

  • We present Liquid, an auto-regressive generation paradigm that seamlessly integrates visual comprehension and generation.

  • Unlike previous multimodal large language model (MLLM), Liquid achieves this integration using a single large language model (LLM), eliminating the need for external pretrained visual embeddings such as CLIP.

  • For the first time, Liquid uncovers a scaling law that performance drop unavoidably brought by the unified training of visual and language tasks diminishes as the model size increases.

  • Furthermore, the unified token space enables visual generation and comprehension tasks to mutually enhance each other

🔥 Multimodal Generation

  • Liquid : Scalable and Versatile Unified Multimodal Generator which supports Visual Understanding, Visual Generation and Multi-modal Generation

teaser

  • Liquid can generate high-quality, photorealistic images of any aspect ratio by language in an autoregressive paradigm.

teaser

🔥 Scaling Law for multimodal generation

  • Liquid shows clear Scaling Law in multimodal generation across different sizes(0.5B to 32B).

teaser

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you find this project useful, please consider citing:

@article{wu2024liquid,
  title={Liquid: Language models are scalable multi-modal generators},
  author={Wu, Junfeng and Jiang, Yi and Ma, Chuofan and Liu, Yuliang and Zhao, Hengshuang and Yuan, Zehuan and Bai, Song and Bai, Xiang},
  journal={arXiv preprint arXiv:2412.04332},
  year={2024}
}
0