Understanding code - error Expected more than 1 value per channel when training


(Alessa Bandrabur) #1

I stopped using AWS and moved to my personal laptop, since I am only doing tests for understanding the code.
I struggled with this error for a while Expected more than 1 value per channel when training, got input size [1, 1024] and I tried to figure out what am I doing wrong. I have seen that this error is displayed by the forward function which calls the F.batch_norm. And interpret it as the fact that he aspects and image [1 3 224 224] for example or [1, 256, 14, 14] where 256 is the number of filters and 14x14 is the size of the input/image. So it does well until the input is flatten - transformed into a vector.

size(input): [1, 64, 112, 112]
size(input): [1, 64, 56, 56]
.....
size(input): [1, 512, 7, 7]
size(input): [1, 512, 7, 7]
size(input): [1, 1024]

Trying to figure out what’s the issue, I replicate the same content on the AWS - where I have no error - running the exact same code. Any help, or orientation tip is extremely appreciated

On the left is the code from aws, and to the right is the same code on my pc + the error

p.s. Even the model looks the same


Type error in (torch.cuda.FloatTensor) trying a csv/image dataset caused by a space in a label name
(Lucas Goulart Vazquez) #2

Since the same code is running with no problems on AWS there’s a chance that you need to update pytorch?


(Jeremy Howard) #3

Or else that the data is different on the two computers?


(Vitaliy) #4

I’m having the same strange issue. What I’ve figured out so far - it probably has something to do with dataset size and batch size (however I haven’t still figured out all the details).

When I train on a sample dataset with ~800 images - it does not reproduce with batch size 32, but it does reproduce with batch size 1.

When I train on a full dataset of ~100K images - it does reproduce with batch size 32, but (if I recall correctly) does not reproduce with batch size 64.

So, I would suggest trying few different batch sizes to mitigate the issue, until the root cause is figured out and fixed.

Upd. I’ve just noticed you asked the question more than a month ago. @alessa Did you figure out what’s the issue?


(Alessa Bandrabur) #5

import torch
print(torch.version)
0.2.0.4


(Alessa Bandrabur) #6

exactly the same data


(Alessa Bandrabur) #7

I had let it go, cause it was too abstract, and I had no clue how to fix the issue.

I come back to it, due to your post, and I realized that whenever I set precompute=True I have the same error both on aws and on my pc which sais running mean should contain 3 elements not 4096

[update] Jeremy is saying what’s the reason of this error [here](How do we use our model against a specific image?

You need to said precompute=False before you do that prediction, since you’re passing in an image, not a precomputed activation.


(Alessa Bandrabur) #8

but when I set no precompute then I have the same error only on my local machine

ValueError: Expected more than 1 value per channel when training, got input size [1, 4096]


(Alessa Bandrabur) #9

It has no relationship with the batch size, I modified it several times and nothing changed.

But I figured out that if this lines of code give me that error "ValueError: Expected more than 1 value per channel when training, got input size [1, 4096]"

data = ImageClassifierData.from_paths(PATH, tfms=tfms, bs=bs)
learn = ConvLearner.pretrained(arch, data)

x,y = next(iter(data.val_dl))
x,y = x[None,0], y[None,0]
m  = learn.models.model
py = m(Variable(x.cuda())); py

as soon as I call learn.predict I have no error anymore

data = ImageClassifierData.from_paths(PATH, tfms=tfms, bs=bs)
learn = ConvLearner.pretrained(arch, data)

learn.predict()

x,y = next(iter(data.val_dl))
x,y = x[None,0], y[None,0]
m  = learn.models.model
py = m(Variable(x.cuda())); py

Variable containing:
-0.9635 -2.2739 -0.6625
[torch.cuda.FloatTensor of size 1x3 (GPU 0)]


(Vitaliy) #10

Ok, I’ve finally figured out what causes the issue.

What stated here is you will get this error each time size of one of your batches is equal to 1.
It has to do something with inner workings of a BatchNorm layer training (inference with a batch size of 1 is possible, according to the guy, who closed the issue).

So, the error occurs either if your batch size equals to 1, or if size of your dataset modulo batch size is equal to 1, causing the last batch of your data to contain a single element. The simple solution is just to remove one data point from your training dataset.

Hope it helps someone :slight_smile:


(Phil) #12

@alessa - i am having the same experience. Running learn.predict() somehow eliminates this mysterious error.