-
Notifications
You must be signed in to change notification settings - Fork 101
something wrong in your code #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issu 8000 e and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@timomernick, first of all, thank you for sharing your codes. After playing around with it, I've found something (maybe) wrong in your According to the paper, we should squash the vectors in
For the correct implementation of When I ran the code with this modification in my environments, model convergence looks much faster and reconstructed images also look okay although I trained the model only up to 2 epochs. I hope this helps you. Thank you again for sharing the codes! |
If either if you want to submit a pull request with a measurement of before & after accuracy I would love to take your fixes! |
Hello Timo, "We want the length of the output vector of a capsule to represent the probability that the entity represented by the capsule is present in the current input. We therefore use a non-linear "squashing" function to ensure that short vectors get shrunk to almost zero length and long vectors get shrunk to a length slightly below 1." The output vector of the primary capsules in your case is (batch_size, 8, 1152) and therefore the squashing non-linearity should be calculated along At this point, the model is so complex that both options work and train well. However, the correct application of the squashing non-linearity is described in the paper. Also: if you calculate the primary capsules (without routing) so that the output size is (batch_size, 1152, 8) such as:
you could get rid of the of the Thanks for your implementation! |
Firstly,thank you for your code
but as i try to read your source code.i find maybe there is errors in your squash function code
Problem1:
from your readme file,i read the tensorflow source code
in the comment,we squash in the vec_len dimension.But in your code
because you have not wrote comment.so we just see here
it is easy to know we should do squashing in dim=1 not 2
Problem2:
how can x with shape(batch, features, num_units, in_units, 1) and w with shape (batch, features, in_units, unit_size, num_units) do matmul operate...
i do not run your code successfully,because of data.so i do not know is it right.
Best Wishes!!
The text was updated successfully, but these errors were encountered: