Here is at least one way to do it … thoughts?
res50_model = models.resnet50(pretrained=True)
res50_conv = nn.Sequential(*list(res50_model.children())[:-2])
This grabs a pretrained resnet50 model courtesy of the
torchvision package and then builds a sequential model based on it that excludes the final two modules (e.g., the one that does average pooling and the fully connected one)
for param in res50_conv.parameters():
param.requires_grad = False
No need to backprop through the model since I’m using it purely for feature extraction.
inputs, labels = next(iter(dataloders['train']))
inputs, labels = Variable(inputs), Variable(labels)
outputs = res50_conv(inputs)
To test, I grab 4 examples and run them through my modified model.
outputs.data.shape # => torch.Size([4, 2048, 7, 7])
And voila, I get the 2048x7x7 output I expected!
It’s feels both weird and cool to be able to pass in images of any size into the network and that it just works. I burnt a few minutes here and there trying to get the model to tell me the output size of this layer or that layer until I realized it only works for the fully connected layers because, I believe, those are the only ones that do have a definitive input and output shape. The convolutional layers inputs/outputs shape will b dynamic based on the shape of your examples … which like I said, feels both weird coming from using Theano/TF but also very cool.