Hooks working

I am trying to explore the hook part of the book chapter. I am trying to understand the output of the hook.

I tried the code provided in the book for hook:

class Hook():
def hook_func(self, m, i, o):
self.stored =o.detach().clone()

hook_output = Hook()
hook =model_saved.avg_pool.register_forward_hook(hook_output.hook_func)

with torch.no_grad():
for data, label in test_loader:
model(data)

act = hook_output.stored[0]
act.shape
torch.Size([64, 1, 1])

my first question is how can we interpret the shape? Is it 1x1 feature map of 64 dimensions?

Second question is: I tried hooks from a different form as well. I am getting the batch size as well:

feat_result_input = []
feat_result_output = []
def get_features_hook(module, data_input, data_output):
feat_result_input.append(data_input)
feat_result_output.append(data_output)

h=model_saved.avg_pool.register_forward_hook(get_features_hook)
with torch.no_grad():
for data, label in test_loader:
model(data)

result:
shape is[100, 64, 1, 1]. Why is there difference?

Hello,

To answer your first question, yes, your understanding of the variable act is sound; the spatial size is 1 X 1, and there are 64 channels, i.e., the dimension is 64.

The reason you are getting the batch size in the second example is that by default, hooking a module’s forward pass includes the batch axis as well; that is, the output of the average pooling layer over all the input samples are returned.

However, in your first question, the following line removes the batch dimension and extracts the output for solely the first data point. Therefore, there is a mismatch between the shapes.

Please let me know if you require further clarifications.

1 Like

Thank you, @BobMcDear, for your clarification. I am confused with the batch size as well.
For example, I have an output of this shape [100, 64, 1, 1]. It means that I only have 1 batch of n_inputs. so lets say I have 50000 examples and this output is giving me only 500 examples? Is it correct?

How can I get all batches of the given input?

Hello,

Using the Hook class that you included in your post, only the output of the final batch fed to your module is retained. For instance, suppose your dataset consists of 500,000 samples, with a batch size of 100. Regardless of how many batches are run through the model, hook_output.stored would contain exclusively the output of the average pooling layer associated with the last batch (i.e., of shape 100 X 64 X 1 X 1).

On the other hand, you may desire to evaluate the hook’s output for every data point in your dataset. A simple approach would be to have a mega-batch composed of your entire dataset, but that is an infeasible and impractical approach. Your get_features_hook is a more elegant strategy; after iterating through the dataset and passing individual batches to the model, feat_result_output would have 5,000 items (5,000 = 500,000/100, the total number of batches), each of shape 100 X 64 X 1 X 1, corresponding to the output of the average pooling layer per batch.

Is that helpful?

1 Like

Alright, it means hook_output.stored would contain exclusively the output of the average pooling layer associated with the last batch (i.e., of shape 100 X 64 X 1 X 1).

You are saying that getting all the batches of data points is not a feasible approach. Fair enough.

I could not understand this part of your explanation.

Your get_features_hook is a more elegant strategy; after iterating through the dataset and passing individual batches to the model, feat_result_output would have 5,000 items (5,000 = 500,000/100, the total number of batches), each of shape 100 X 64 X 1 X 1, corresponding to the output of the average pooling layer per batch.

Hello,

Getting the outputs of a hook for your entire dataset within a single forward pass is intractable. In other words, you cannot simply bundle together every data point into a gigantic batch and feed it to the model to acquire the hook’s output for every sample due to the comically large memory usage.

However, in the segment below (copy-pasted from your original post),

feat_result_input = []
feat_result_output = []
def get_features_hook(module, data_input, data_output):
        feat_result_input.append(data_input)
        feat_result_output.append(data_output)

h=model_saved.avg_pool.register_forward_hook(get_features_hook)
with torch.no_grad():
        for data, label in test_loader:
                model(data)

what you are doing is go through every batch in the dataset, extract the hook’s output for that particular batch, and store it in feat_result_output. Thus, feat_result_output would contain the result of the average pooling layer for every batch in your data loader, with the ith element corresponding to the ith batch in test_loader. If you would like to have the output of the hook in a single tensor in lieu of many smaller ones, you can do feat_result_output = torch.cat(feat_result_output).

Does that make things clearer?

1 Like

I appreciate your time to reply to my queries and clear my doubts.
ResNet( (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (layer1): Sequential( (0): ResidualBlock( (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (1): ResidualBlock( (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer2): Sequential( (0): ResidualBlock( (conv1): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): ResidualBlock( (conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer3): Sequential( (0): ResidualBlock( (conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): ResidualBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (avg_pool): AvgPool2d(kernel_size=8, stride=8, padding=0) (fc): Linear(in_features=64, out_features=10, bias=True) )

I am getting input [100, 64, 8, 8] and output [100, 64, 1, 1] Why I am getting output of 1, 1

Hello,

No worries at all.

Are you familiar with how average pooling works? If not, this article can shed some light on the topic. In your case, the average pooling layer’s input has a spatial dimension of 8 X 8, and since the kernel size is 8 X 8, the data’s height and width are downsampled by a factor of 8, i.e., the output would be 1 X 1.

Is that informative?

Yes, thank you

1 Like