Learn.summary() throws ValueError: Expected more than 1 value per channel when training, got input size [1, 1024]


(Hiromi Suenaga) #1

I have attached a stripped down version of lesson2-image_models.ipynb below.

What is strange about it is learn.summary with no parenthesis works, and so does learn. But when I run learn.summary(), it throws the following error:


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-bc39e9e85f86> in <module>()
----> 1 learn.summary()

~/fastai/courses/dl1/fastai/conv_learner.py in summary(self)
   117         precompute = self.precompute
   118         self.precompute = False
--> 119         res = super().summary()
   120         self.precompute = precompute
   121         return res

~/fastai/courses/dl1/fastai/learner.py in summary(self)
    51     def data(self): return self.data_
    52 
---> 53     def summary(self): return model_summary(self.model, [3,self.data.sz,self.data.sz])
    54 
    55     def __repr__(self): return self.model.__repr__()

~/fastai/courses/dl1/fastai/model.py in model_summary(m, input_size)
   161         x = [to_gpu(Variable(torch.rand(1,*in_size))) for in_size in input_size]
   162     else: x = [to_gpu(Variable(torch.rand(1,*input_size)))]
--> 163     m(*x)
   164 
   165     for h in hooks: h.remove()

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
   323         for hook in self._forward_pre_hooks.values():
   324             hook(self, input)
--> 325         result = self.forward(*input, **kwargs)
   326         for hook in self._forward_hooks.values():
   327             hook_result = hook(self, input, result)

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
    65     def forward(self, input):
    66         for module in self._modules.values():
---> 67             input = module(input)
    68         return input
    69 

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
   323         for hook in self._forward_pre_hooks.values():
   324             hook(self, input)
--> 325         result = self.forward(*input, **kwargs)
   326         for hook in self._forward_hooks.values():
   327             hook_result = hook(self, input, result)

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py in forward(self, input)
    35         return F.batch_norm(
    36             input, self.running_mean, self.running_var, self.weight, self.bias,
---> 37             self.training, self.momentum, self.eps)
    38 
    39     def __repr__(self):

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/nn/functional.py in batch_norm(input, running_mean, running_var, weight, bias, training, momentum, eps)
  1009         size = list(input.size())
  1010         if reduce(mul, size[2:], size[0]) == 1:
-> 1011             raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
  1012     f = torch._C._functions.BatchNorm(running_mean, running_var, training, momentum, eps, torch.backends.cudnn.enabled)
  1013     return f(input, weight, bias)

ValueError: Expected more than 1 value per channel when training, got input size [1, 1024]

Is anybody else experiencing this issue?

Thank you!


(Aditya) #2

Hi,
The issue is related to the PyTorch version?
Downgrading to .2 will bring back things to Normal…


(Hiromi Suenaga) #3

That is it! I can’t say I understand what colesbury is saying :frowning: I will poke around and see if I can figure out.

Thanks for taking a look :slight_smile:


(Aditya) #4
It will fail. Don't train on batches of size 1 
if you use feature-wise batch normalization. 
(Inference is fine on batch-size 1). 
Skip over the left-over batch.

Batch normalization computes:

y = (x - mean(x)) / (std(x) + eps)

If you have one sample per batch then mean(x) = x,
 and the output will be entirely zero 
(ignoring the bias). 
You can't use that for learning....

I have read the BatchNorm paper and it makes sense to thereafter…

Pleasure…

Batch Norm Paper Link…

https://arxiv.org/abs/1502.03167


(Stephan Rasp) #5

I also ran into this error recently when the last batch in my training was of size one. So basically

n_samples % bs = 1

I simply changed the random seed of my train/valid split to get around the problem, but how could this be fixed properly?


(Soumya) #6

Has anyone solved this issue yet?

I tried doing learn.predict() and then learn.predict_array() which solves the issue, but doesn’t make much sense to me as to why is this happening.


(Hiromi Suenaga) #7

I’ve been putting a check to make sure that training data size % batch size is not 1 before I start training. Maybe we could put something in fastai library that if the last batch only had 1 thing in it and model has batch norm, throw away that data. But that seems rather disruptive - so maybe we just be cautious.


(Stephan Rasp) #8

I will look later today at the batch norm paper/definition. Maybe we can then include a check as you suggested and create a merge request.


(Hiromi Suenaga) #9

The other day, I checked everything with sample data set and all was well. At the end of the day, I kicked off training with the bigger set - which just happened to have the last batch with 1 data in it. I woke up the next day to a failed training :weary:


(Stephan Rasp) #10

This seems like something we should fix :smiley:


(Stephan Rasp) #11

I looked at the issue tonight, but in the process of creating a minimal example I stumbled onto another issue for which I created a GitHub issue: https://github.com/fastai/fastai/issues/240

Once this is fixed, I will look at the bs=1 problem.


#12

Hi hiromi, hi everybody,
it works if you do this:

learn.model.eval()
learn.summary()

Somehow, you need to set the model in evaluation mode to make the summary method work.

By the way, thanks to @ramesh the predict_array method now works correctly too. The reason is that the module should be set in evaluation mode when making predictions, because it changes the behavior of certain modules (e.g. BatchNorm).


(Hiromi Suenaga) #13

Hello :slight_smile:

Yes, .eval disables BatchNorm and Dropout so that when you are running on validation set or test set, you get a better result (at that point, you are not concerned with avoiding overfitting). For training, however, we want to use BatchNorm. I initially came across this issue when printing out the summary, but the root cause of this actually causes your training to fail.

If @raspstephan doesn’t get it first, I can look into creating a PR to at least check the final batch size so that you do not have to wait until the end of the epoch to see the failure. Hope that clears some stuff :slight_smile:


(Stephan Rasp) #14

Hi, I will look at it again tonight. I tried creating a minimal example with the ImageDataLoader.from_array() function. I created a training set with 65 samples and a batch size of 64, but training worked fine!?

If you have time, maybe you could try to create a minimal example that produces the error.


(Hiromi Suenaga) #15

Certainly! I’m almost done with what I’m working on right now, so I will create a notebook with minimal reproducible example. It’s certainly possible somebody else got to it since the last time I looked at it.


#16

Yes, you are right, in fact I was planning to use learn.model.train() thereafter to set the model in training mode, so I would just use eval() to print the summary. But yeah, I agree it’s not the smartest way to do that :slight_smile:


(Hiromi Suenaga) #17

Here is batch size of 4 with training dataset of size 5:

So it’s definitely reproducible but maybe it is good to be aware of how to go around the issue. Because once we start creating our own models, we would need to know…


(Theodoros Galanos) #18

Hi,

I was wondering if this was ever handled with. I was getting the same error here and bypassed it by leaving some data out of the problem but that feels a bit inefficient.

Would a crude solution be to replicate, only for training, a few rows / images of the training dataset in order to have a complete final batch? It should at least be better than deleting input data.

Kind regards,
Theodore.


(Hiromi Suenaga) #19

The issue only happens when the last batch has only 1 element. I usually adjust the batch size so that the remainder is anything but one, but your solution also works :slight_smile: