This is a fun repo, it combines Masked AutoEncoders (MAE) with Channel Vision Transformers (ChannelVit) to Channel Masked AutoEncoders (ChannelMAE). It is essentially channels getting rolled out during MAE pretraining. This repos also has support for subsequent fine-tunning.
Masked AutoEncoders (MAE): Are a powerfull pretraining model were we mask out often 75% of the images and make the model predict the rest.
Channel Vision Transformers (ChannelVit): Are useful in non-traditional image applications like cell-painting images or satellite images, where each channels conveys very different information and it doesnt makes sense stacking them up.
Channel Masked AutoEncoders (ChannelMAE):
Combining them can be useful for pretraining models for non-traditional image applications like cell-painting images or satellite images.
This is based on MAE
and Vit
implementation https://github.com/facebookresearch/mae, modified to add ChannelVit
and ChannelMAE
- Pre-training code for MAE and ChannelMAE
- Fine-tuning code for Vit and ChannelVit (encoders of MAE and ChannelMAE)
- Linprobe code for Vit and ChannelVit (encoders of MAE and ChannelMAE)
We implemented ChannelMAE
in models_chamae.py
We implemented ChannelVit
in models_vit.py
Sample testing implementation use:
python main_pretrain.py
For submitting jobs: The instruction is in PRETRAIN.md.
Sample testing implementation use:
python main_finetune.py
python main_lineprobe.py
For submitting jobs: The instruction is in FINETUNE.md.
Masked Autoencoders Are Scalable Vision Learners
Channel Vision Transformer: An Image Is Worth C x 16 x 16 Words
This project is under the CC-BY-NC 4.0 license. See LICENSE for details.