

It looks like you’re missing the transforms (“tfms=”) field in your ImageClassifierData.from_arrays function call. Try:

data = ImageClassifierData.from_arrays(PATH, trn=(X_train, y_train), val=(X_val, y_val), tfms=tfms_from_model(arch, sz), bs=bs, classes=labels, num_workers=4, test=test)

My understanding is that even if you’re not doing any data augmentation, you still need to include tfms= for library to properly reshape, resize, and normalize your inputs to what pytorch expects.

If this doesn’t resolve it, you may also need to triple that last dimension to “x3”. You can do this with np.stack of your array onto itself 3 times in the appropriate axis, e.g.:

array_3 = np.stack([array_1, array_1, array_1], axis=3)

You’re probably also getting the “dims of x and y do not match” AssertionError because you have triple the number of labels in your y_train vs X_train samples (and the same for y_val and X_val).

The trn and val dimensions to put into ImageClassiferData.from_arrays should probably look like:

X_train: (11200, 28, 28, 3)
X_val: (2800, 28, 28, 3)
y_train: (11200,)
y_val: (2800,)

EDIT: Worth mentioning that the RuntimeError you’re seeing is specifically saying that pytorch expected an input in the shape of [n, 3, x, y]. In other words, your 4-dimensional tensor is expected to have 3 channels (in the second axis) before your x, y data (in the 3rd and 4th axes). Fastai, via the transform, should roll your input axes from [n, x, y, num_channels] to the correct [n, num_channels, x, y] shape. If you’re still getting a similar error after the above changes, I guess you could try manually rolling your channel axis into the correct position and using that as the input:

X_train: (11200, 3, 28, 28)

Thanks so much for your help Dave! I really appreciate it.

Your final comment about the input expecting a shape of [batch size, channels, height, width] is what the issue is. Also, I was re-watching one of the lessons, and Jeremy confirms that PyTorch’s tensor dims are a different order from other libraries that typically put channels at the back (i.e., [batch size, height, width, channels]), so I need to reshape my array accordingly.

Thanks again!



@daveluo, can you, please, tell me, how trn in from_array() looks like? As far as I understand, it’s a tuple of two matrix: the second is an array of labels, the first is a matrix of: each row represents according label in the second matrix, each column represents 1 image. But what is image? Is that a buffer, or path to the image (as I can see in your post it’s some numbers, so - no), or what?


Hi @dortonway,

Here’s what my ImageClassifierData.from_arrays() function call looks like:

data = ImageClassifierData.from_arrays(path=RESULTS, trn=(x_train, y_train), val=(x_holdout, y_holdout), tfms=tfms_from_model(arch, sz), bs=bs, classes=2)

You’re right that trn= should be a tuple of 2 ndarrays (x_train, y_train):

  1. x_train is a 4-dimensional array in the shape (# samples, # rows, # columns, # channels) where (# rows, # columns) is the 2D pixel representation of each image in size sz x sz in a single channel.
  2. y_train is a 1-dimensional array storing the respective labels for x_train in shape (# samples,)

For example, in my dataset, x_train.shape, y_train.shape shows

((1176, 75, 75, 3), (1176,))

meaning 1176 training samples of 75x75pixel images in 3 channels, with 1176 corresponding y labels.

The 75x75 part of the 1st sample in the 1st channel (x_train[0,:,:,0]) is the standard deviation-normalized 2D array of the original (or resized to) 75x75 pixel image and looks like:

array([[ 0.42137, -0.43078, -0.4308 , ...,  0.93337,  0.51248, -0.25854],
       [ 0.16545, -0.1627 ,  0.26667, ...,  0.73933,  0.56651, -0.25854],
       [-0.5351 , -0.16272,  0.30614, ..., -0.67258,  0.01582, -0.28268],
       [-0.64376, -1.06065, -1.26056, ...,  0.56625,  0.85387,  0.40087],
       [-0.38035, -0.6163 , -0.56212, ...,  1.20078,  1.75155,  1.17225],
       [ 0.12359, -0.6163 , -0.25752, ...,  0.78885,  1.69286,  1.55917]], dtype=float32)

When displayed with plt.imshow(x_train[0,:,:,0]), looks like:

(this is a synthetic-aperture radar (SAR) satellite image of a possible iceberg or ship from the Kaggle statoil/c-core iceberg classifier challenge)

Hope that helps clear things up!


@mmr how did you solve this problem?

This is the code I use:

bs = 16
sz = 75
tfms = tfms_from_model(arch, sz, aug_tfms=transforms_top_down, max_zoom=1.05)
data = ImageClassifierData.from_arrays(PATH, trn = (x_train, y_train), val = (x_val, y_val), tfms = tfms , bs = bs, classes = 2, test = X_test)
learn = ConvLearner.pretrained(arch, data, precompute=True)

And I get a similar error KeyError:


As mentioned by Jeremy that we have a pandas dataframe and not a numpy array, but in the data object we create, we put in the array which we get from the train_test_split. so where does the dataframe come into picture ?

@sahilk1610 - There was an issue regarding the way I was taking in the input data. I am sorry it was sometime ago, and this is purely from my recollection.

If I have a dataset with images of different sizes, instead of X_train.shape == (11200, 28, 28, 3) I will have something like (11200,) and the function doesn’t work (even with tfms set).

I guess I will have to resize the images before calling the function, but shouldn’t do it for me automatically?

Yeah, I just had to resize it myself.

I wonder how we can use


just like ImageClassifierData.from_csv?

I have X-rays datasets that try to predict some motor scores (continuous data) , but the images are DICOM format, so I have to use pydicom convert to numpy array.