8000 multi-label softmax support by xdshang · Pull Request #3268 · BVLC/caffe · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

multi-label softmax support #3268

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

multi-label softmax support #3268

wants to merge 4 commits into from

Conversation

xdshang
Copy link
@xdshang xdshang commented Nov 1, 2015
  • Support N labels along the softmax axis. The final loss is average loss over all labels.
  • By combining the ignore_label, it supports variable number of labels.

@bhack
Copy link
Contributor
bhack commented Nov 1, 2015

/cc @mtamburrano

Xindi Shang added 2 commits November 1, 2015 22:25
…abel, since the size of label blob is doubled. When the size of label blob is (10, 1, 2, 3), it also causes checking failure with checking accuracy of 5e-5.
}
DCHECK_GE(label_value, 0);
DCHECK_LT(label_value, prob_.shape(softmax_axis_));
loss -= log(std::max(prob_data[i * dim + label_value * inner_num_ + j],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure is this right?
It shouldn't be something like
loss -= log(std::max(prob_data[(i * dim) + (dim/label_num_*k) + label_value * inner_num_ + j], Dtype(FLT_MIN))); ?
I'm not sure how you think to feed bottom[0] to match the dimension of the labels, it shouldn't be larger, with a size of previous_size*label_num_?
Let's say we had 3-classes single labels, so an INNER_PRODUCT layer with num_output: 3 was enough. Now if we have for each input 2 labels each one with 3-classes, the INNER_PRODUCT should have num_output: 6 and you should iterate on the prob_ blob with an offset that considers both the number of the classes and the size of the labels.
Is it right or I'm missing something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not addressing multi-class problem, but multi-label problem where multiple labels are assigned to each instance. In your example, supposing instance i is assigned with 1st and 3rd class, the loss for the instance is simply the average of losses on that two classes, i.e. log(prob_[i * dim + 0 * inner_num_ + j]) and log(prob_[i * dim + 2 * inner_num_ + j]). The INNER_PRODUCT still have num_output: 3 in this case.
Generally, if there is M classes and each instance has K labels, the shape of data blob is still (N, M) and the shape of label blob is (N, K). Originally it only allows (N, 1).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I get it, I supposed you were addressing multi-class problems.
Thank you

@BlGene
Copy link
Contributor
BlGene commented Nov 11, 2015

Just for reference there was as Multi label Data and MultiLabel Accuracy PR ( #523) previously.
Which was followed by #1380. #523 was closed because @shelhamer said:

we concluded that losses and layers are capable of handling multilabel problems

Playing devil's advocate:

  1. As comparison, what would be the closest way to implement this functionality with existing layers? I presume a inner product that has num_output: N*K, re-size into (N,K) and then do a normal softmax that compares this to a (N,1) label.
  2. And how is this PR better than that.
8000

@taras-sereda
Copy link

I'm looking for multi label functionality in caffe.
What I've seen is a lot of PRs which are abandoned.
Can somebody clarify what exactly should be done to perform such classification, what parts are missing now and what is already present in framework?

How I understand this task.
Having pairs of img and vector of labels (hot encoded) [1, 0, 0, 1, 0, 1]
it would be enough to have only CrosEntropyLoss which will be in fact minimisation of KL-divergence, right? Between true labels distribution (normalised) and predicted through SoftMax.

@bhack I've seen your comments on all the PR branches so may be you know more details.
I'm ready to contribute, create tutorial and examples of model deffiniton. I just need some update on this problem to understand what should be added to make it real.
Thanks in advance.

@beniz
Copy link
beniz commented Nov 11, 2015

@taras-sereda got frustrated with the multiple PRs so I decided to take the time to lay down a few solutions I know of the other day, the discussion thread lies here: https://groups.google.com/forum/#!topic/caffe-users/RuT1TgwiRCo

@taras-sereda
Copy link

@beniz Thanks for sharing.
I mean if there are multiple ways to solve this problem, it would be nice to create some examples. And I'm ready to work on this part.

@xdshang
Copy link
Author
xdshang commented Nov 11, 2015

@BlGene For your first advice, I think the problem is how you get the N * K output. For each instance, supposing there are N labels and the num_output is K, you need to duplicate the K-dimensional output N times. This is much more consumptive than that we simply compute N positions over the K-dimensional output.

PR #523 only proposed multilabel accuracy layer, which is for test phase. Furthermore, multilabel loss may often be used in webly-tag classification where the number of tags is huge. In our experiment, we use around 30,000 tags (classes) and there are only 20 positive tags in average for each instance, so it is quite sparse. In this case, hot encoded label ([1, 0, 0, 1, 0, 1]) is not proper.

@bhack
Copy link
Contributor
bhack commented Nov 12, 2015

@beniz and others. See also #3326

@diPDew
Copy link
diPDew commented Dec 3, 2015

In terms of classification performance, how does this multi-label version of the Softmax loss compare with the SigmoidCrossEntropyLoss layer?

@huhusuperma
Copy link

how to load the multi labels for data, cause i cannot find any examples in your project. It seems that u didn't modify the datalayer.

@xdshang xdshang closed this Jun 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0