Open
Description
I am sorry that I am only a beginner in AI. I wonder what the difference is between Perceiver Resampler and Vision Decoder. Do they both use the same idea of "class token", adding identical learnable token vectors after the transformer decoder, and then using the output of the corresponding positions as the module output?
Metadata
Metadata
Assignees
Labels
No labels