-
Notifications
You must be signed in to change notification settings - Fork 18.6k
multi-label softmax support #3268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Support N labels along the softmax axis. The final loss is average loss over all labels.
- By combining the ignore_label, it supports variable number of labels.
…average loss value over all given labels.
/cc @mtamburrano |
…abel, since the size of label blob is doubled. When the size of label blob is (10, 1, 2, 3), it also causes checking failure with checking accuracy of 5e-5.
} | ||
DCHECK_GE(label_value, 0); | ||
DCHECK_LT(label_value, prob_.shape(softmax_axis_)); | ||
loss -= log(std::max(prob_data[i * dim + label_value * inner_num_ + j], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure is this right?
It shouldn't be something like
loss -= log(std::max(prob_data[(i * dim) + (dim/label_num_*k) + label_value * inner_num_ + j], Dtype(FLT_MIN)));
?
I'm not sure how you think to feed bottom[0]
to match the dimension of the labels, it shouldn't be larger, with a size of previous_size*label_num_
?
Let's say we had 3-classes single labels, so an INNER_PRODUCT
layer with num_output: 3
was enough. Now if we have for each input 2 labels each one with 3-classes, the INNER_PRODUCT
should have num_output: 6
and you should iterate on the prob_
blob with an offset that considers both the number of the classes and the size of the labels.
Is it right or I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not addressing multi-class problem, but multi-label problem where multiple labels are assigned to each instance. In your example, supposing instance i
is assigned with 1st and 3rd class, the loss for the instance is simply the average of losses on that two classes, i.e. log(prob_[i * dim + 0 * inner_num_ + j])
and log(prob_[i * dim + 2 * inner_num_ + j])
. The INNER_PRODUCT
still have num_output: 3
in this case.
Generally, if there is M
classes and each instance has K
labels, the shape of data
blob is still (N, M)
and the shape of label
blob is (N, K)
. Originally it only allows (N, 1)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I get it, I supposed you were addressing multi-class problems.
Thank you
Just for reference there was as Multi label Data and MultiLabel Accuracy PR ( #523) previously.
Playing devil's advocate:
|
I'm looking for multi label functionality in caffe. How I understand this task. @bhack I've seen your comments on all the PR branches so may be you know more details. |
@taras-sereda got frustrated with the multiple PRs so I decided to take the time to lay down a few solutions I know of the other day, the discussion thread lies here: https://groups.google.com/forum/#!topic/caffe-users/RuT1TgwiRCo |
@beniz Thanks for sharing. |
@BlGene For your first advice, I think the problem is how you get the N * K output. For each instance, supposing there are N labels and the num_output is K, you need to duplicate the K-dimensional output N times. This is much more consumptive than that we simply compute N positions over the K-dimensional output. PR #523 only proposed multilabel accuracy layer, which is for test phase. Furthermore, multilabel loss may often be used in webly-tag classification where the number of tags is huge. In our experiment, we use around 30,000 tags (classes) and there are only 20 positive tags in average for each instance, so it is quite sparse. In this case, hot encoded label ([1, 0, 0, 1, 0, 1]) is not proper. |
In terms of classification performance, how does this multi-label version of the Softmax loss compare with the |
how to load the multi labels for data, cause i cannot find any examples in your project. It seems that u didn't modify the datalayer. |