I am trying to use Densnet in multiclass image classification but having trouble with loading and using in on a Paperspace P5000 instance. The weird thing is, I can train on every other custom net like Resnet 18, Resnext 50 and many other architectures, using excactly the same code but the Densenet fails for some reason.
The same happens, whether I use Densenet121, 161 or 169. Here is the code that fails:
What happens is that it downloads the model and starts precomputing but then a message appears, “The kernel appears to have died, it will restart automatically”. I have tried different batch sizes, from 1 to 64 without luck. The image size is 224x224 which as far as I know is the default input size for Densenet but correct me if I am wrong.
I haven’t seen many topics on Densenet on the forums and I only found it when looking at the fast ai library source code. Perhaps it has not been fully implemented? Anyone else had a similar experience using Densenet and the Fast AI library?
I also ran into issues trying to use Densenet with the usual routine. The solution was to replace arch=densenet121 with arch=dn121. I have no idea why that should be necessary since ‘densenet121’ does seem to be the name it should be and the one exported by the usual modules…
If you look at fastai/torch_imports.py you’ll see densenet121 being imported from torchvision.models. Further below dn121 is defined so that automatically loads its pretrained weights.
Now why is it like this? Best I can tell: the pytorch models’ layers are defined with 2 nn.Sequentials. The 1st contains all the conv layers, the 2nd is the fully-connected ‘classifier head’. Fastai builds its own classifier head for pretrained models. So the dn121 definition only takes the first element of densenet121's children; ie: the 1st (conv) Sequential set of layers.
Then when you call ConvLearner.pretrained(...), the pretrained weights are loaded for the conv-portion of the model, and a new fully-connected classifier head is built.
You can see this yourself by starting up a jupyter notebook and running:
from fasta.conv_learner import *
# len = 2 for 2 nn.Sequential's
len(children(densenet121))
# this'll display the model's layers
children(densenet121)
# this'll download weights for & display dn121
children(dn121(True))
Edit: Just to add a little more information. Most fastai model is basically removing the last two layers of existing models (which are average pooling and linear layer) with adaptive concat pooling, flatten, and one or two linear layers (first linear layer with relu) and ending with logsoftmax. The adavantage adaptive concat pooling provides is that the input size need not be fixed as the adaptive average pooling will take care of that part. It is quite easy to implement any fastai model if you have the base model (which you can get from cadene’s pytorch repository)
Thx for the link, good writeup and clarifies things a bit regarding how FAI builds the top layer for the pretrained networks. I coincidentally was also wondering about this regarding using a model only with pytorch dependencies in production with FAI weights. The weight names between pytorch and FAI are different and I’m having a bit of hard time loading them from FAI to pytorch.
Densenet works fine when using dn121,dn161 etc. as described above
Yeah it seems densenet does indeed work properly. Unfortunately, there were a quite a few models like senet and se_resnext50 which were not working out of the box from fastai library and thus I tried to understand what exactly was happening and wrote my findings. Might help when using any other model.