-
Notifications
You must be signed in to change notification settings - Fork 6
Merge module for Convolution #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice addition, it seems ok for merge module. I'm not sure about the drop global device, why removing it?
@@ -69,7 +218,6 @@ def __init__( | |||
name: str | None = None, | |||
s_growth_is_needed: bool = False, | |||
) -> None: | |||
device = device if device is not None else global_device() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing this line?
def construct_full_activity(self) -> torch.Tensor: | ||
assert self.previous_modules, f"No previous modules for {self.name}." | ||
full_activity = torch.ones( | ||
(self.previous_modules[0].input.shape[0], self.total_in_features), | ||
device=self.device, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this wont work. You create a 2-order tensor while expecting a 3-order tensor when computing S. I think full_activity
should be of shape (n, total_in_features, out_channels)
. (where n is the number of examples in the batch).
Anyway, if I am not mistaken you will immediately see it with the sanity checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's looking good. If I may I would suggest to do a first and separate pull request where we switch compute_optimal_delta
form child class to MergeModule
thus this pull request would not change growing_module
nor linear_growing_module
. (I could do this separate PR if you want)
def __out_dimention(self, dim: int) -> int: | ||
return ( | ||
int( | ||
(self.input_size[dim] - self.kernel_size[dim] + 2 * self.padding[dim]) | ||
/ self.stride[dim] | ||
) | ||
+ 1 | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want you could use the function compute_output_shape_conv
from gromo.utils.tools
that does already (almost) those computation and has been tested.
@property | ||
def in_features(self) -> int: | ||
return self.in_channels * self.input_size[0] * self.input_size[1] + self.use_bias |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice and quiet useful !
No description provided.