Lesson 14 wiki

(Rachel Thomas) #1

Lesson resources

More information

Part 2 early release videos now available!
Combine one-hot encodings?
Pre-release part 2 videos
(Jeremy Howard) #2

I just added videos, notebooks, and data.

(Karthik Kannan) #3

Hey @jeremy , the video is only 1hr 15 minutes long. The video runs only up until the break and ends there. :slight_smile:

(Jeremy Howard) #4

Wow that’s odd. I have another copy so will upload shortly.

(Jeremy Howard) #5

OK I’ve uploaded the full video now. Sorry about that!

(Gidi Shperber) #6

the rossman_exp.py file here seems inaccessible…

(garima.agarwal) #7

Me too getting an error with the _exp.py file.

(lin.crampton) #8

Can’t find rossman_exp.py either. Even looked for rossmann_exp.py (with 2 n’s), couldn’t find that.

(Jeremy Howard) #9

Sorry about that! I’ve added that file to part2 github now.

(lin.crampton) #10

transcript is in the file:

in the same directory with all the other transcripts from this course:

please let me know if there are any corrections

and please let me know if there is somewhere else that these could reside …

(Jeremy Howard) #11

When I put the MOOC online I’ll make these into youtube captions. :slight_smile:

(Constantin) #12

I am working on the tiramisu-keras notebook, trying to reproduce it.

Sorry if I missed something, but I can’t find the file label_colors.txt

label_codes,label_names = zip(*[
    parse_code(l) for l in open(PATH+"label_colors.txt")])

It didn’t come with the above GitHub link to CamVid apparently. I’d be grateful for pointers.

BTW - Nice work on the ```segm_generator````class! Learning a lot from it. And it is useful, too.

(Jeremy Howard) #13

Oops sorry! It’s here https://github.com/mostafaizz/camvid

I’m glad! I was pretty pleased with it, I must say :slight_smile:

(Benedikt S) #14

@jeremy How can we edit/send a push request to the notebooks.

I worked on the rossmann notebook and found a bug:

In Create Models. there are two cells with following content

  1. Cell
    map_train = split_cols(cat_map_train) + [contin_map_train]
    map_valid = split_cols(cat_map_valid) + [contin_map_valid]

  2. Cell
    map_train = split_cols(cat_map_train) + split_cols(contin_map_train)
    map_valid = split_cols(cat_map_valid) + split_cols(contin_map_valid)

If I execute it in this order, the 2nd cell will not fit with the format of the model later:
model = Model([inp for inp,emb in embs] + [contin_inp], x)

I need to execute only the first cell and then the format/shape of the arrays fits to the configuration of the model.
The difference is:

  1. Cell: The map is an array with # of emb + 1 array, which contains for each continoues variable one array
  2. Cell: The map is an array with # of emb + # of continous variables

When I used the 2nd Cell, the input for the model is too large because the continoues variables are not packed in one array.

Image shape Does not Match - TIRAMISU
(Constantin) #15

I am confused about the labels the model should see at its output.
After a long period of training I got semi-useful predictions, but the model seems to have trouble to get any better and I believe I must habe made a mistake.
BTW - I am training it on a version of PASCAL VOC 2010.

So, the model output looks like

convolution2d_99 (Convolution2D) (None, 256, 256, 22)  5654        merge_101[0][0]                  
reshape_1 (Reshape)              (None, 65536, 22)     0           convolution2d_99[0][0]           
activation_98 (Activation)       (None, 65536, 22)     0           reshape_1[0][0]                  
Total params: 9,427,990
Trainable params: 9,324,918
Non-trainable params: 103,072

In my case - 22 classes.
So, the data the segm_generator() spits out is (x, y) with y being the labels of shape (n_samples, rows*cols, 1). So, the classes are encoded into the image as unique integers ranging from 0…21. The model will train.
However, this is nothing like what the Activation layer produces, because it would rather output probabilities for each pixel being of class c between 0 and 1.

So, here my confusion: Even though activation_98 has dims (None, 256256, 22) the model starts fitting with an input of shape (None, 256256, 1) – can this be explained by numpy broadcasting?
However, if I now insert a to_categorical() into segm_generator it’ll output (None, 256*256, 22) one-hot encoded, but the model won’t train. It’ll throw a ValueError:

ValueError: Error when checking model target: expected activation_98 to have shape (None, 65536, 1) but got array with shape (4, 65536, 22)

Why is that?

A related question: In @jeremy’s tiramisu-keras.ipynb notebook there is a whole section on “Convert labels” which I deemed to be necessary for pretty display only, by encoding RGB values in a class dictionary like this.

[((64, 128, 64), 'Animal'),
 ((192, 0, 128), 'Archway'),
 ((0, 128, 192), 'Bicyclist'),
 ((0, 128, 64), 'Bridge'),
 ((128, 0, 0), 'Building')]

Is that true that it is display only?

(Jeremy Howard) #16

Thanks for checking - they’re on github now so can send a PR. Although @bckenstler is heavily changing them for the next couple of weeks.

Although, in general, I don’t expect the notebooks to be things you can just hit shift-enter repeatedly and expect it to work.

(Jeremy Howard) #17

That’s because we use sparse_categorical_crossentropy loss, which handles this automatically.

Yeah there’s a couple of different versions of the data I found - not all parts are needed for the dataset I ended up using.

(Constantin) #18

Is anyone else working on the tiramisu semantic segmentation example?

I have been working on training the model on PASCAL VOC 2010 data for a while. I am currently unsure if my model just needs way longer to train or if there is a bug.

Could someone qualitatively judge if the below results make sense?

The input image looks e.g. something like:

The labels look like this (here four labels). Note: This dataset uses one void class which is the outline of the objects.

Now, after some 170 epochs (plus minus in different runs) I get a segmentation for the four classes in this picture like:

Top left should be background, top right airplane, bottom left people, bottom right void class (outlines).

Now, while you can say that the model learned something, the result isn’t what it should be. Also, I am starting to wonder if it is a good idea to include the void class at all. Clearly, the model tries to approximate it with a generic edge detector.
The man in the bottom left is almost lost, but he does have poor contrast.

So, essentially I am wondering - is it worth putting more training time in or is it clear (gut-feeling) that there must be a bug?
What makes me a little nervous is that val_acc (and val_loss, not shown) hover around the same value and never seem to improve a lot.

Normalizations I did: Mean subtraction and division by 255.
Labels dataset: At first tried with sparse_cat_crossentropy

However, I started wondering if this is right. With above normalization images scale between [-0.5, 0.5], but the labels are (if one-hot encoded) [0, 1]. At first sight this should still work, but might cause unecessary stress for the model to stretch. So, I am currently trying just division by 255, no mean subtraction.

Really appreciate your opinion.

Some metrics of a typcial training run.

(Constantin) #19

Going back to CamVid this is what I got:

The model is mostly right and does an amazing job on the tree (high contrast). It mostly makes errors where objects are eclipsed (e.g. lady with shopping cart and pole) or low contrast (car in the background).
I got about 90.5 % acc on the training set and 89% acc on the (hold-out) test set after 375 epochs.

I noticed a few more things:

  • Size matters. With crops of 64x64 px I couldn’t get more than ~ 77% acc. With 192x192 crops I got 87% acc on the test set. I attribute this to the context and what percentage of the total area of the object is actually contained in the crop. The above results were obtained on using 320x224 px.
  • A matrix output and target learns faster than an unrolled vector - context again (unrolling looses some spatial relations)? Do it by Rehape, Activation, Reshape or by using hidim_softmax().
def hidim_softmax(x, axis=-1):
    """Softmax activation function.
    # Arguments
        x : Tensor.
        axis: Integer, axis along which the softmax normalization is applied.
    # Returns
        Tensor, output of softmax transformation.
    # Raises
        ValueError: In case `dim(x) == 1`.
    ndim = K.ndim(x)
    if ndim == 2:
        return K.softmax(x)
    elif ndim > 2:
        e = K.exp(x - K.max(x, axis=axis, keepdims=True))
        s = K.sum(e, axis=axis, keepdims=True)
        return e / s
        raise ValueError('Cannot apply hidim_softmax to a tensor that is 1D')

The latter is contained in keras 2.0 apparently (I am still on 1.2.2).

(Constantin) #20

Below a modified version of create_tiramisu() found in tiramisu-keras.ipynb. Use either possibility and note you need to modify segm_generator() as well. You need to feed (None, rows, cols, classes) instead of (None, rows*cols, classes).

def create_tiramisu_hd(nb_classes, img_input, nb_dense_block=6, 
    growth_rate=16, nb_filter=48, nb_layers_per_block=5, p=None, wd=0):

    if type(nb_layers_per_block) is list or type(nb_layers_per_block) is tuple:
        nb_layers = list(nb_layers_per_block)
    else: nb_layers = [nb_layers_per_block] * nb_dense_block

    x = conv(img_input, nb_filter, 3, wd, 0)
    skips,added = down_path(x, nb_layers, growth_rate, p, wd)
    x = up_path(added, reverse(skips[:-1]), reverse(nb_layers[:-1]), growth_rate, p, wd)
    x = conv(x, nb_classes, 1, wd, 0)
    _,r,c,f = x.get_shape().as_list()
    #x = Reshape((-1, nb_classes))(x) # for Activation('softmax')
    #x = Activation('softmax')(x) # 
    #x = Reshape((r, c, f))(x)
    x = Lambda(hidim_softmax)(x)
    return x