Lesson 14 wiki

rachel · April 11, 2017, 12:54am

Lesson resources

Video
Lesson 14 video timeline
Rossmann data
Camvid (see Camvid directory)
Camvid, additional files (e.g. label_color.txt)

More information

Interview with 3rd place winners (entity embeddings)
Arxiv paper written by 3rd place winners
The new book club starting May 1st

jeremy · April 11, 2017, 6:58pm

I just added videos, notebooks, and data.

karthik_k314 · April 11, 2017, 9:19pm

Hey @jeremy , the video is only 1hr 15 minutes long. The video runs only up until the break and ends there.

jeremy · April 11, 2017, 9:52pm

Wow that’s odd. I have another copy so will upload shortly.

jeremy · April 12, 2017, 1:02am

OK I’ve uploaded the full video now. Sorry about that!

shgidi · April 12, 2017, 7:29am

the rossman_exp.py file here seems inaccessible…

garima.agarwal · April 17, 2017, 2:50am

Me too getting an error with the _exp.py file.

lin.crampton · April 17, 2017, 4:49pm

Can’t find rossman_exp.py either. Even looked for rossmann_exp.py (with 2 n’s), couldn’t find that.

jeremy · April 17, 2017, 7:52pm

Sorry about that! I’ve added that file to part2 github now.

lin.crampton · April 18, 2017, 1:44am

transcript is in the file:
https://drive.google.com/open?id=0BxXRvbqKucuNWEFZbUZwb2RIbFk

in the same directory with all the other transcripts from this course:
https://drive.google.com/open?id=0BxXRvbqKucuNcXdyZ0pYSW5UTzA

please let me know if there are any corrections

and please let me know if there is somewhere else that these could reside …

jeremy · April 19, 2017, 1:11am

When I put the MOOC online I’ll make these into youtube captions.

iNLyze · April 19, 2017, 11:30am

I am working on the tiramisu-keras notebook, trying to reproduce it.

Sorry if I missed something, but I can’t find the file label_colors.txt

label_codes,label_names = zip(*[
    parse_code(l) for l in open(PATH+"label_colors.txt")])

It didn’t come with the above GitHub link to CamVid apparently. I’d be grateful for pointers.

BTW - Nice work on the ```segm_generator````class! Learning a lot from it. And it is useful, too.

jeremy · April 19, 2017, 2:43pm

Oops sorry! It’s here https://github.com/mostafaizz/camvid

I’m glad! I was pretty pleased with it, I must say

benediktschifferer · April 23, 2017, 12:09pm

@jeremy How can we edit/send a push request to the notebooks.

I worked on the rossmann notebook and found a bug:

In Create Models. there are two cells with following content

Cell
map_train = split_cols(cat_map_train) + [contin_map_train]
map_valid = split_cols(cat_map_valid) + [contin_map_valid]
Cell
map_train = split_cols(cat_map_train) + split_cols(contin_map_train)
map_valid = split_cols(cat_map_valid) + split_cols(contin_map_valid)

If I execute it in this order, the 2nd cell will not fit with the format of the model later:
model = Model([inp for inp,emb in embs] + [contin_inp], x)

I need to execute only the first cell and then the format/shape of the arrays fits to the configuration of the model.
The difference is:

Cell: The map is an array with # of emb + 1 array, which contains for each continoues variable one array
Cell: The map is an array with # of emb + # of continous variables

When I used the 2nd Cell, the input for the model is too large because the continoues variables are not packed in one array.

iNLyze · April 23, 2017, 11:18pm

I am confused about the labels the model should see at its output.
After a long period of training I got semi-useful predictions, but the model seems to have trouble to get any better and I believe I must habe made a mistake.
BTW - I am training it on a version of PASCAL VOC 2010.

So, the model output looks like

convolution2d_99 (Convolution2D) (None, 256, 256, 22)  5654        merge_101[0][0]                  
____________________________________________________________________________________________________
reshape_1 (Reshape)              (None, 65536, 22)     0           convolution2d_99[0][0]           
____________________________________________________________________________________________________
activation_98 (Activation)       (None, 65536, 22)     0           reshape_1[0][0]                  
====================================================================================================
Total params: 9,427,990
Trainable params: 9,324,918
Non-trainable params: 103,072

In my case - 22 classes.
So, the data the segm_generator() spits out is (x, y) with y being the labels of shape (n_samples, rows*cols, 1). So, the classes are encoded into the image as unique integers ranging from 0…21. The model will train.
However, this is nothing like what the Activation layer produces, because it would rather output probabilities for each pixel being of class c between 0 and 1.

So, here my confusion: Even though activation_98 has dims (None, 256256, 22) the model starts fitting with an input of shape (None, 256256, 1) – can this be explained by numpy broadcasting?
However, if I now insert a to_categorical() into segm_generator it’ll output (None, 256*256, 22) one-hot encoded, but the model won’t train. It’ll throw a ValueError:

ValueError: Error when checking model target: expected activation_98 to have shape (None, 65536, 1) but got array with shape (4, 65536, 22)

Why is that?

A related question: In @jeremy’s tiramisu-keras.ipynb notebook there is a whole section on “Convert labels” which I deemed to be necessary for pretty display only, by encoding RGB values in a class dictionary like this.

[((64, 128, 64), 'Animal'),
 ((192, 0, 128), 'Archway'),
 ((0, 128, 192), 'Bicyclist'),
 ((0, 128, 64), 'Bridge'),
 ((128, 0, 0), 'Building')]

Is that true that it is display only?

jeremy · April 24, 2017, 4:44pm

Thanks for checking - they’re on github now so can send a PR. Although @bckenstler is heavily changing them for the next couple of weeks.

Although, in general, I don’t expect the notebooks to be things you can just hit shift-enter repeatedly and expect it to work.

jeremy · April 24, 2017, 4:46pm

That’s because we use sparse_categorical_crossentropy loss, which handles this automatically.

Yeah there’s a couple of different versions of the data I found - not all parts are needed for the dataset I ended up using.

iNLyze · April 26, 2017, 11:31pm

Is anyone else working on the tiramisu semantic segmentation example?

I have been working on training the model on PASCAL VOC 2010 data for a while. I am currently unsure if my model just needs way longer to train or if there is a bug.

Could someone qualitatively judge if the below results make sense?

The input image looks e.g. something like:

The labels look like this (here four labels). Note: This dataset uses one void class which is the outline of the objects.

Now, after some 170 epochs (plus minus in different runs) I get a segmentation for the four classes in this picture like:

Top left should be background, top right airplane, bottom left people, bottom right void class (outlines).

Now, while you can say that the model learned something, the result isn’t what it should be. Also, I am starting to wonder if it is a good idea to include the void class at all. Clearly, the model tries to approximate it with a generic edge detector.
The man in the bottom left is almost lost, but he does have poor contrast.

So, essentially I am wondering - is it worth putting more training time in or is it clear (gut-feeling) that there must be a bug?
What makes me a little nervous is that val_acc (and val_loss, not shown) hover around the same value and never seem to improve a lot.

Normalizations I did: Mean subtraction and division by 255.
Labels dataset: At first tried with sparse_cat_crossentropy

However, I started wondering if this is right. With above normalization images scale between [-0.5, 0.5], but the labels are (if one-hot encoded) [0, 1]. At first sight this should still work, but might cause unecessary stress for the model to stretch. So, I am currently trying just division by 255, no mean subtraction.

Really appreciate your opinion.

Some metrics of a typcial training run.

iNLyze · May 4, 2017, 7:23pm

Going back to CamVid this is what I got:

The model is mostly right and does an amazing job on the tree (high contrast). It mostly makes errors where objects are eclipsed (e.g. lady with shopping cart and pole) or low contrast (car in the background).
I got about 90.5 % acc on the training set and 89% acc on the (hold-out) test set after 375 epochs.

I noticed a few more things:

Size matters. With crops of 64x64 px I couldn’t get more than ~ 77% acc. With 192x192 crops I got 87% acc on the test set. I attribute this to the context and what percentage of the total area of the object is actually contained in the crop. The above results were obtained on using 320x224 px.
A matrix output and target learns faster than an unrolled vector - context again (unrolling looses some spatial relations)? Do it by Rehape, Activation, Reshape or by using hidim_softmax().

def hidim_softmax(x, axis=-1):
    """Softmax activation function.
    # Arguments
        x : Tensor.
        axis: Integer, axis along which the softmax normalization is applied.
    # Returns
        Tensor, output of softmax transformation.
    # Raises
        ValueError: In case `dim(x) == 1`.
    """
    ndim = K.ndim(x)
    if ndim == 2:
        return K.softmax(x)
    elif ndim > 2:
        e = K.exp(x - K.max(x, axis=axis, keepdims=True))
        s = K.sum(e, axis=axis, keepdims=True)
        return e / s
    else:
        raise ValueError('Cannot apply hidim_softmax to a tensor that is 1D')

The latter is contained in keras 2.0 apparently (I am still on 1.2.2).

iNLyze · May 4, 2017, 10:27pm

Below a modified version of create_tiramisu() found in tiramisu-keras.ipynb. Use either possibility and note you need to modify segm_generator() as well. You need to feed (None, rows, cols, classes) instead of (None, rows*cols, classes).

def create_tiramisu_hd(nb_classes, img_input, nb_dense_block=6, 
    growth_rate=16, nb_filter=48, nb_layers_per_block=5, p=None, wd=0):

    if type(nb_layers_per_block) is list or type(nb_layers_per_block) is tuple:
        nb_layers = list(nb_layers_per_block)
    else: nb_layers = [nb_layers_per_block] * nb_dense_block

    x = conv(img_input, nb_filter, 3, wd, 0)
    skips,added = down_path(x, nb_layers, growth_rate, p, wd)
    x = up_path(added, reverse(skips[:-1]), reverse(nb_layers[:-1]), growth_rate, p, wd)
    
    x = conv(x, nb_classes, 1, wd, 0)
    _,r,c,f = x.get_shape().as_list()
    #x = Reshape((-1, nb_classes))(x) # for Activation('softmax')
    #x = Activation('softmax')(x) # 
    #x = Reshape((r, c, f))(x)
    x = Lambda(hidim_softmax)(x)
    return x