VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics

This repository contains the official implementation of VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics.

Dataset

VICON was evaluated on three fluid dynamics datasets:

PDEArena-Incomp (incompressible Navier-Stokes)
PDEBench-Comp-HighVis (compressible Navier-Stokes)
PDEBench-Comp-LowVis (compressible Navier-Stokes with numerical-zero viscosity)

Refer to dataset_prepare/README.md for details.

Usage

We use Hydra for configuration management, allowing flexible parameter modifications via command line or config files. Example:

# Train the model with specific configurations
# Assuming GPUs 0 and 1, enable wandb logging
# TODO: Define your dataset directories either in script or config file
CUDA_VISIBLE_DEVICES="0,1" python src/train.py plot=0 board=1 amp=0 dataset_workers=2 multi_gpu=1 datasets.train_batch_size=30 loss.min_ex=5 model.transformer.num_layers=10 model.use_patch_pos_encoding=True model.use_func_pos_encoding=True datasets.types.COMPRESSIBLE2D.folder=$COMPRESSIBLE2D_DIR datasets.types.EULER2D.folder=$EULER2D_DIR datasets.types.NS2D.folder=$NS2D_DIR

Refer to the configs folder for detailed configuration options.

Motivations

Current approaches to operator learning of PDEs face challenges limiting practical applications:

Single Operator Learning
- Requires complete retraining when equation type or parameters change
- Impractical for real-world deployment where system conditions vary
Pretrain-Finetune Approach
- Can handle multiple PDE types during pretraining
- Still requires substantial data collection (hundreds of frames/trajectories) for finetuning
- Challenging in downstream applications with limited data availability, e.g., online environments

The ICON Innovation and Its Limitations

ICON (Yang et al, 24) introduced a novel perspective inspired by in-context learning in LLMs:

Defines physical fields (before/after certain timestep) as query/answer (or COND/QoI) pairs
Extracts dynamics directly from a few pairs without requiring finetuning

However, ICON faces an architectural limitation:

Processes entire discretized physical fields as individual query/answer
Results in extremely long transformer sequences
Becomes computationally infeasible for real-scale, high-dimensional data

VICON's Solution

Figure 1: Schematic overview of VICON architecture.

Inspired by Vision Transformers (ViT), which efficiently handle large images by processing them in patches, VICON overcomes these limitations while maintaining the benefits of in-context learning. Our contributions include:

First implementation of in-context learning for 2D PDEs without requiring explicit PDE information
State-of-the-art empirical results compared to existing methods (at the time of VICON's development)
Flexible rollout capabilities through learning to extract dynamics from pairs with varying timestep sizes

Method

VICON combines the in-context operator learning framework with vision transformer architecture through several key components:

1. Patch-wise Processing

Divides input physical fields into manageable patches
Significantly reduces sequence length compared to token-per-point approach in original ICON

2. Dual Positional Encoding System

Our system uses two types of positional encodings to inform precise spatial and function relationships between tokens:

a) Patch Position Encoding

Encodes relative spatial relationships betw 7AD1 een patches
Maintains awareness of physical space structure

b) Function Position Encoding

Indicates the role of each patch in the sequence:
- Which pair in the sequence it belongs to
- Whether it's part of the input (query) or output (answer) in that pair
Crucial for maintaining the in-context learning structure

3. Flexible Rollout Strategies

VICON's unique training approach enables versatile rollout schemes:

Forms pairs with varying timestep sizes during training
Allows a single trained model to:
- Extract dynamics at different time scales
- Perform rollouts with various timestep strides
- Potentially reduce the number of rollout steps when appropriate, hence,
- Minimize error accumulation in long-term predictions

Figure 2: VICON's flexible rollout strategy: Starting with timestep dt=1, the model progressively accumulates flow fields needed for larger stride predictions up to dt=5, enabling efficient long-term predictions with larger timestep strides.

Results

Performance Improvements over MPP

Reduction in scaled L2 error for long-term predictions:
- 40% for PDEBench-Comp-LowVis
- 61.6% for PDEBench-Comp-HighVis
67% reduction in turbulence kinetic energy prediction error for PDEBench-Comp-LowVis
3x faster inference time

Figure 3: Comparison of turbulence kinetic energy predictions: VICON demonstrates superior accuracy over MPP in both RMSE metrics and advanced physical statistics.

Timestep Stride Generalization and Flexible Rollout Strategies

VICON demonstrates exceptional adaptability in handling varying time strides:

Successfully maintains prediction accuracy with previously unseen larger strides (smax=6,7), despite training only on smax=1~5
Particularly effective for PDEArena-Incomp dataset, where multi-step rollout (smax=5) largely outperforms single-step predictions
Enables direct application to experimental settings with hardware-constrained sampling rates, eliminating the need for interpolation or retraining the model

Figure 4: Comparison with state-of-the-art performance. VICON is additionally evaluated with different timestep strides, including previously unseen ones. For PDEArena-Incomp dataset, maximum stride size (smax=5) achieves optimal rollout performance.

Citation

@article{cao2024vicon,
  title={VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics Prediction},
  author={Cao, Yadi and Liu, Yuxuan and Yang, Liu and Yu, Rose and Schaeffer, Hayden and Osher, Stanley},
  journal={arXiv preprint arXiv:2411.16063},
  year={2024}
}

Poster

Acknowledgements

This work was supported by the US Army Research Office (Army-ECASE W911NF-23-1-0231), US Department of Energy, IARPA HAYSTAC Program, CDC, DARPA AIE FoundSci, DARPA YFA, NSF grants (#2205093, #2100237, #2146343, #2134274), AFOSR MURI (FA9550-21-1-0084), NSF DMS 2427558, NUS Presidential Young Professorship, STROBE NSF STC887 DMR 1548924, and ONR N00014-20-1-2787.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
dataset_prepare		dataset_prepare
figs		figs
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics

Dataset

Usage

Motivations

The ICON Innovation and Its Limitations

VICON's Solution

Method

1. Patch-wise Processing

2. Dual Positional Encoding System

3. Flexible Rollout Strategies

Results

Performance Improvements over MPP

Timestep Stride Generalization and Flexible Rollout Strategies

Citation

Poster

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Eydcao/VICON

Folders and files

Latest commit

History

Repository files navigation

VICON: Vision In-Context Operator Networks for Multi-Physics Fluid Dynamics

Dataset

Usage

Motivations

The ICON Innovation and Its Limitations

VICON's Solution

Method

1. Patch-wise Processing

2. Dual Positional Encoding System

3. Flexible Rollout Strategies

Results

Performance Improvements over MPP

Timestep Stride Generalization and Flexible Rollout Strategies

Citation

Poster

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages