I am confused about the labels the model should see at its output.
After a long period of training I got semi-useful predictions, but the model seems to have trouble to get any better and I believe I must habe made a mistake.
BTW - I am training it on a version of PASCAL VOC 2010.
So, the model output looks like
convolution2d_99 (Convolution2D) (None, 256, 256, 22) 5654 merge_101
reshape_1 (Reshape) (None, 65536, 22) 0 convolution2d_99
activation_98 (Activation) (None, 65536, 22) 0 reshape_1
Total params: 9,427,990
Trainable params: 9,324,918
Non-trainable params: 103,072
In my case - 22 classes.
So, the data the
segm_generator() spits out is (x, y) with y being the labels of shape (n_samples, rows*cols, 1). So, the classes are encoded into the image as unique integers ranging from 0..21. The model will train.
However, this is nothing like what the Activation layer produces, because it would rather output probabilities for each pixel being of class c between 0 and 1.
So, here my confusion: Even though activation_98 has dims (None, 256*256, 22) the model starts fitting with an input of shape (None, 256*256, 1) -- can this be explained by numpy broadcasting?
However, if I now insert a
segm_generator it'll output (None, 256*256, 22) one-hot encoded, but the model won't train. It'll throw a ValueError:
ValueError: Error when checking model target: expected activation_98 to have shape (None, 65536, 1) but got array with shape (4, 65536, 22)
Why is that?
A related question: In @jeremy's
tiramisu-keras.ipynb notebook there is a whole section on "Convert labels" which I deemed to be necessary for pretty display only, by encoding RGB values in a class dictionary like this.
[((64, 128, 64), 'Animal'),
((192, 0, 128), 'Archway'),
((0, 128, 192), 'Bicyclist'),
((0, 128, 64), 'Bridge'),
((128, 0, 0), 'Building')]
Is that true that it is display only?