Open
Description
Hi, in my computer science project, I'm using CLIP's text encoder and vision encoder to extract image-text features:
image_features = self.encode_image(image)
text_features = self.encode_text(text)
If I directly compute loss using these image_features and text_features with a custom loss function (without invoking CLIP's forward function), would this affect the interpretability of the project?
Metadata
Metadata
Assignees
Labels
No labels