Open
Description
In test_zeroshot.py file, the code for the final test accuracy looks like this:
` i_features.append(concat_all_gather(image_features))
sim_logits.append(concat_all_gather(similarity))
labels.append(concat_all_gather(class_id))
if dist.get_rank() == 0:
## half-classes evaluation
sim, la = sim_logits[0], labels[0]
vid_feat = i_features[0]
for i in range(1, len(sim_logits)):
sim = torch.cat((sim, sim_logits[i]), 0)
la = torch.cat((la, labels[i]), 0)
vid_feat = torch.cat((vid_feat, i_features[i]), 0)
text_feat = cls_feature/ cls_feature.norm(dim=-1, keepdim=True)
acc_split, acc_split_top5 = multi_split_test(vid_feat.cpu(), text_feat.cpu(), la.cpu())
accuracy_split, accuracy_split_std = np.mean(acc_split), np.std(acc_split)
accuracy_split_top5, accuracy_split_top5_std = np.mean(acc_split_top5), np.std(acc_split_top5)`
The accuracy of the calculation here is the original image feature, regardless of video_header.
Metadata
Metadata
Assignees
Labels
No labels