Yep, I did. Still the same error. In fact, none of the DenseNet models are working!.
Sounds like I better look into it…
Hi Jeremy - was the issue with densenet fixed?
No, turns out I had to teach 7 classes this week, which has kept me busy It’s high on my priority list though.
Any update on a fix for densenet? I’ve been using an older version of fastai so I can use it as it just so happens to perform really well compared to other models.
I’m not sure I’ll have time this week. I’ll try! (But feel free to have a go yourself at a fix since I think you’d find it an interesting area of fastai to learn about…)
DenseNet implementation in PyTorch has repeated modules. The layers are in the form of OrderedDict. Repeated modules key names causes the optimizer throw this error:
ValueError: some parameters appear in more than one parameter group
I found this PyTorch issue helpful in understanding the problem.
I did the following to fix the error:
- Modified the
torchvision.models.densenet
code to append the block numbers to the layer names. - Copied the weights from
torchvision.models.densenet
models to the models with updated layer names. - Saved the
state_dict
for the updated model.
With these changes I am able load and train all of the DenseNet models.
Here is the code I used to transfer the model weights
import torch
from densenet import *
import torchvision
from collections import OrderedDict
from tqdm import tqdm
dn_models = {
'densenet121': densenet121,
'densenet169': densenet169,
'densenet201': densenet201,
'densenet161': densenet161,
}
torch_models = {
'densenet121': torchvision.models.densenet121,
'densenet169': torchvision.models.densenet169,
'densenet201': torchvision.models.densenet201,
'densenet161': torchvision.models.densenet161,
}
for m in tqdm(dn_models.keys()):
print(f"Fixing {m}")
# densenet with layer names fixed
dnetm = dn_models[m]()
# original densenet
dnet = torch_models[m](True).eval()
# get the state dict of
dnet_sdict = dnet.state_dict()
d_keys = dnet_sdict.keys()
dm_keys = dnetm.state_dict().keys() # modified densenet keys
dnetm.load_state_dict(OrderedDict(zip(dm_keys, dnet_sdict.values())))
dnetm.eval()
dnetm_sdict = dnetm.state_dict()
for k1, k2 in zip(d_keys, dm_keys):
assert torch.equal(dnet_sdict[k1], dnetm_sdict[k2]), f"{k1}!={k2}"
torch.save(dnetm.state_dict(), model_locs[m])
print(f"Saving to {model_locs[m]}\n")
print("Done!")
Modified DenseNet code
Fixed DenseNet weights
@jeremy Does this look like valid solution? Or is there a better way of fixing this issue?
Thanks and welcome!
I’ve fixed this in fastai. It wasn’t actually the reason that @vikram suggested (although still awesome that his fix worked anyway!) but was due to a bug in how layer groups were created. I’ve fixed it in a rather hacky way for now, which works in my testing - but let me know if anyone sees any problems. I’ll endeavor to find a cleaner API for creating layer groups in the future…
@jeremy Glad that there was a simple fix! However, I see this error upon calling learn.unfreeze
after the latest pull.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-932c6b65ac49> in <module>()
----> 1 learn.unfreeze()
~/vikram/fast_ai/fastai/courses/dl1/fastai/conv_learner.py in unfreeze(self)
182 None
183 """
--> 184 self.freeze_to(0)
185 self.precompute = False
~/vikram/fast_ai/fastai/courses/dl1/fastai/learner.py in freeze_to(self, n)
64 c=self.get_layer_groups()
65 for l in c: set_trainable(l, False)
---> 66 for l in c[n:]: set_trainable(l, True)
67
68 def unfreeze(self): self.freeze_to(0)
~/vikram/miniconda3/envs/py36/lib/python3.6/site-packages/torch/nn/modules/container.py in __getitem__(self, idx)
51
52 def __getitem__(self, idx):
---> 53 if not (-len(self) <= idx < len(self)):
54 raise IndexError('index {} is out of range'.format(idx))
55 if idx < 0:
TypeError: '<=' not supported between instances of 'int' and 'slice'
I figured out that this is arising from the updated get_layer_groups function. Specifically when we ask for fully connected block.
if do_fc:
return self.fc_model
Comparing this to previous implementation, in the updated code you are returning torch.nn.modules.container.Sequential
earlier it was list
. May be torch.nn.modules.container.Sequential
does not support indexing? I see error when I do c[n:]
.
Anyway, I did an easy fix by casting return as list of children of the fc_model
and the unfreeze works.
if do_fc:
return list(children(self.fc_model))
I hope I got this right this time.
What is the right way of getting the layer groups? Can you help me figure out how I can improve the code?
Ah sorry - I was running out of time before class so it seems I didn’t test it properly… Anyway I’ve wrapped it in a list now and it’s OK again. (There’s no need to make each child of fc_model a separate layer - we can treat it all as one layer).
I don’t know - if I knew I would have implemented it!
@vikram While working on that Chestnet dataset did you get this error Metrics for multilabel dataset: accuracy_multi() missing 1 required positional argument: 'thresh'? I get this error using a resnet model (I noticed in a previous post you used resnet before moving do densenet).
When using densnet 121 (the same method as used by Ng https://arxiv.org/pdf/1711.05225.pdf) I’m getting another error altogether
RuntimeError: invalid argument 2: 3D or 4D (batch mode) tensor expected for input, but got: [64 x 1000] at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THCUNN/generic/SpatialAdaptiveMaxPooling.cu:22
This is the complete stack trace:
RuntimeError Traceback (most recent call last)
<ipython-input-27-2cf15c2f4ff1> in <module>()
1 # determine learning rate using learning rate finder
----> 2 lrf=learn.lr_find()
3 learn.sched.plot()
~/fastai/courses/dl1/fastai/learner.py in lr_find(self, start_lr, end_lr, wds)
249 layer_opt = self.get_layer_opt(start_lr, wds)
250 self.sched = LR_Finder(layer_opt, len(self.data.trn_dl), end_lr)
--> 251 self.fit_gen(self.model, self.data, layer_opt, 1)
252 self.load('tmp')
253
~/fastai/courses/dl1/fastai/learner.py in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, metrics, callbacks, use_wd_sched, norm_wds, wds_sched_mult, **kwargs)
158 n_epoch = sum_geom(cycle_len if cycle_len else 1, cycle_mult, n_cycle)
159 fit(model, data, n_epoch, layer_opt.opt, self.crit,
--> 160 metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, **kwargs)
161
162 def get_layer_groups(self): return self.models.get_layer_groups()
~/fastai/courses/dl1/fastai/model.py in fit(model, data, epochs, opt, crit, metrics, callbacks, **kwargs)
84 batch_num += 1
85 for cb in callbacks: cb.on_batch_begin()
---> 86 loss = stepper.step(V(x),V(y))
87 avg_loss = avg_loss * avg_mom + loss * (1-avg_mom)
88 debias_loss = avg_loss / (1 - avg_mom**batch_num)
~/fastai/courses/dl1/fastai/model.py in step(self, xs, y)
38 def step(self, xs, y):
39 xtra = []
---> 40 output = self.m(*xs)
41 if isinstance(output,(tuple,list)): output,*xtra = output
42 self.opt.zero_grad()
~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
222 for hook in self._forward_pre_hooks.values():
223 hook(self, input)
--> 224 result = self.forward(*input, **kwargs)
225 for hook in self._forward_hooks.values():
226 hook_result = hook(self, input, result)
~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
65 def forward(self, input):
66 for module in self._modules.values():
---> 67 input = module(input)
68 return input
69
~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
222 for hook in self._forward_pre_hooks.values():
223 hook(self, input)
--> 224 result = self.forward(*input, **kwargs)
225 for hook in self._forward_hooks.values():
226 hook_result = hook(self, input, result)
~/fastai/courses/dl1/fastai/layers.py in forward(self, x)
8 self.ap = nn.AdaptiveAvgPool2d(sz)
9 self.mp = nn.AdaptiveMaxPool2d(sz)
---> 10 def forward(self, x): return torch.cat([self.mp(x), self.ap(x)], 1)
11
12 class Lambda(nn.Module):
~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
222 for hook in self._forward_pre_hooks.values():
223 hook(self, input)
--> 224 result = self.forward(*input, **kwargs)
225 for hook in self._forward_hooks.values():
226 hook_result = hook(self, input, result)
~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/pooling.py in forward(self, input)
820
821 def forward(self, input):
--> 822 return F.adaptive_max_pool2d(input, self.output_size, self.return_indices)
823
824 def __repr__(self):
~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/functional.py in adaptive_max_pool2d(input, output_size, return_indices)
383 return_indices: whether to return pooling indices
384 """
--> 385 return _functions.thnn.AdaptiveMaxPool2d.apply(input, output_size, return_indices)
386
387
~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/_functions/thnn/pooling.py in forward(ctx, input, output_size, return_indices)
501 backend.SpatialAdaptiveMaxPooling_updateOutput(backend.library_state,
502 input, output, indices,
--> 503 ctx.output_size[1], ctx.output_size[0])
504 if ctx.return_indices:
505 ctx.save_for_backward(input, indices)
RuntimeError: invalid argument 2: 3D or 4D (batch mode) tensor expected for input, but got: [64 x 1000] at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THCUNN/generic/SpatialAdaptiveMaxPooling.cu:22
I appreciate any help!
Thnx
p.s. I’m not sure how this is caused RuntimeError: invalid argument 2: 3D or 4D (batch mode) tensor expected for input, but got: [64 x 1000]
. I understand that 64 is my default batch size – but why the 1000?
When I call data.classes
I get the correct 15 classes (including “No Finding”)
['Atelectasis',
'Cardiomegaly',
'Consolidation',
'Edema',
'Effusion',
'Emphysema',
'Fibrosis',
'Hernia',
'Infiltration',
'Mass',
'No Finding',
'Nodule',
'Pleural_Thickening',
'Pneumonia',
'Pneumothorax']
Could you please post how you are instantiating ConvLearner? I suspect this is because of the loss function. Some this is a multi label classification, we should use something like f-beta score. Check planet.ipynb from fastai.
My ConvLearner looks like this learn = ConvLearner.pretrained(arch, data, precompute=False)
.
I also tried the metrics from planet and did the following learn = ConvLearner.pretrained(arch, data, metrics=metrics)
where metrics = [f2] just like Lesson2 shows https://github.com/fastai/fastai/blob/master/courses/dl1/lesson2-image_models.ipynb
Also in the Planets notebook the tensor used is [torch.FloatTensor of size 64x17]
mine for chestnet is [torch.FloatTensor of size 32x15]
. However
I am trying the dn121 on a medical dataset from kaggle. But getting the following error:
I checked data.classes, did read and plot an image from my training set and did img.shape also. they all show okay. How do i go about this?
I am getting same error on different problem.
Did you find the solution?