One Hundred Layers Tiramisu

Fine-tuning stopped early (using max patience of 50 based on validation loss). Accuracy on the test set did not improve. But perhaps I should train longer and not use the validation set to early stop? The training set error does look like it was coming down… Perhaps I can also exclude the random horizontal flipping during fine-tuning…

Metrics - Epoch 970 (Loss, Error)
Validation – 0.178, 7.17
Train – 0.183, 8.90
Test – 0.441, 13.4

What are good next steps? I was going to try running it on Pascal VOC or MSCOCO.

Pascal VOC and MSCOCO would be interesting!

Can you loan me some GPUs :wink:

Win some!

1 Like

Facebook has a recent blog post where they’ve open sourced some of their research on segmentation, treating it as a pixel level classification problem. It’s not exactly what you’re looking for, but it might stimulate some ideas by digging around their source code.

2 Likes

@brendan I figured out why we’re not replicating the paper’s accuracy. It’s because they remove the ‘void’ category from their accuracy measurement. Once I remove that, I can get around 89%. Still not quite as good as the paper, but as good as @kelvin was getting.

Interesting! I think the real test is to benchmark performance on MSCoCo or Pascal.

What does +80 refer to next to DB in the photo taken from Lecture 14? Does DenseBlock add 80 filters each time?

1 Like

Looks to me like the number of filters you add with the respective densebox. TD is for transition, so no new filters there. How many filters are added is controlled through the growth factor parameter.

1 Like

Yep! If the growth factor is 16, each dense layer appends 16 filters to the volume it receives and passes it on to the next layer. In the example above, each Dense Block has 4 dense layers.

@brendan but how does that make 80 then?

Apologies! In that particular example each Dense Block had 5 Dense Layers. But the # of dense layers per block is a parameter you can play with–4, 5, 10, etc.

@brendan thank you for the clarification :blush:

@brendan Hi, is there any new progress now? I am running your pytorch-tiramisu, I would like to know do you replicate it successfully, thank you!

Haven’t touched it since class. I was not able to replicate the authors result although we got close. Jeremy suggested it may have to do with including/ excluding the background class.

1 Like

Hello Guys …

I was Trying to Convert this Tiramisu Model into a Protobuff File to run this Model on Android …

def export_model(saver, model, input_node_names, output_node_name):
    MODEL_NAME='tiramisu'
    tf.train.write_graph(K.get_session().graph_def, 'out', \
        MODEL_NAME + '_graph.pbtxt')

    saver.save(K.get_session(), 'out/' + MODEL_NAME + '.chkp')

    freeze_graph.freeze_graph('out/' + MODEL_NAME + '_graph.pbtxt', None, \
        False, 'out/' + MODEL_NAME + '.chkp', output_node_name, \
        "save/restore_all", "save/Const:0", \
        'out/frozen_' + MODEL_NAME + '.pb', True, "")

    input_graph_def = tf.GraphDef()
    with tf.gfile.Open('out/frozen_' + MODEL_NAME + '.pb', "rb") as f:
        input_graph_def.ParseFromString(f.read())

    output_graph_def = optimize_for_inference_lib.optimize_for_inference(
            input_graph_def, input_node_names, [output_node_name],
            tf.float32.as_datatype_enum)

    with tf.gfile.FastGFile('out/opt_' + MODEL_NAME + '.pb', "wb") as f:
        f.write(output_graph_def.SerializeToString())

    print("graph saved!")

Tried Calling this function >>

export_model(tf.train.Saver(), model, [model.input.name], model.output.name)

It gives the below error >>
INFO:tensorflow:Restoring parameters from out/tiramisu.chkp

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-80-ce4abb242ea5> in <module>()
----> 1 export_model(tf.train.Saver(), model, [model.input.name], model.output.name)

<ipython-input-71-81c8aebbb95e> in export_model(saver, model, input_node_names, output_node_name)
      5     saver.save(K.get_session(), 'out/' + MODEL_NAME + '.chkp')
      6 
----> 7     freeze_graph.freeze_graph('out/' + MODEL_NAME + '_graph.pbtxt', None,         False, 'out/' + MODEL_NAME + '.chkp', output_node_name,         "save/restore_all", "save/Const:0",         'out/frozen_' + MODEL_NAME + '.pb', True, "")
      8 
      9     input_graph_def = tf.GraphDef()

/usr/local/lib/python3.5/dist-packages/tensorflow/python/tools/freeze_graph.py in freeze_graph(input_graph, input_saver, input_binary, input_checkpoint, output_node_names, restore_op_name, filename_tensor_name, output_graph, clear_devices, initializer_nodes, variable_names_blacklist)
    177       clear_devices,
    178       initializer_nodes,
--> 179       variable_names_blacklist)
    180 
    181 

/usr/local/lib/python3.5/dist-packages/tensorflow/python/tools/freeze_graph.py in freeze_graph_with_def_protos(***failed resolving arguments***)
    114         input_graph_def,
    115         output_node_names.split(","),
--> 116         variable_names_blacklist=variable_names_blacklist)
    117 
    118   with gfile.GFile(output_graph, "wb") as f:

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/graph_util_impl.py in convert_variables_to_constants(sess, input_graph_def, output_node_names, variable_names_whitelist, variable_names_blacklist)
    202   # This graph only includes the nodes needed to evaluate the output nodes, and
    203   # removes unneeded nodes like those involved in saving and assignment.
--> 204   inference_graph = extract_sub_graph(input_graph_def, output_node_names)
    205 
    206   found_variables = {}

/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/graph_util_impl.py in extract_sub_graph(graph_def, dest_nodes)
    139 
    140   for d in dest_nodes:
--> 141     assert d in name_to_node_map, "%s is not in graph" % d
    142 
    143   nodes_to_keep = set()

AssertionError: truediv:0 is not in graph

Below is the Model.summary ()

https://pastebin.com/72gUrRr0

Can Some one Suggest…

Hey, guys

When I try to replicate Jeremy’s results, I keep get these at the beginning of training. Is that normal?

Epoch 1/100
232s - loss: nan - acc: 2.0441e-06 - val_loss: nan - val_acc: 2.9938e-04
Epoch 2/100
232s - loss: nan - acc: 2.4274e-06 - val_loss: nan - val_acc: 2.9938e-04

Uh, figured out it is class unmatch problem

Hi everyone, I know this topic is kind of stale but I’ll ask anyway.
I’m trying to replicate the tiramisu paper and got stuck on Transition Up part of the model. I don’t quite understand how authors get a reduction in a number of filters on their way back to full resolution. How do you go from 1072 to 800 on the second upsample block if dense blocks only add features? Do I use 1x1 conv to do feature reduction (not part of the described model)
This is the part of the paper where they explain it (page 4) but I don’t get it. Can someone point me to the code that does that? Or explain with a lot of hand waving? Thanks!

Since the upsampling path increases the feature maps spatial
resolution, the linear growth in the number of features
would be too memory demanding, especially for the full
resolution features in the pre-softmax layer.
In order to overcome this limitation, the input of a dense
block is not concatenated with its output. Thus, the transposed
convolution is applied only to the feature maps obtained
by the last dense block and not to all feature maps
concatenated so far.

Hi Brendan,

I’ve trained up your model with my custom data, and it’s working nicely. Now I’m trying to get it converted to coreml using onnx-coreml, but I’m stuck on an unsupported operation error:

TypeError: Error while converting op of type: Slice. Error message: Only single axis Slice is supported now

EDIT: To provide more context in figuring out where this is failing, here is the log of the conversion (seems like it’s in TransitionUp?.. perhaps center_crop()?):

1/224: Converting Node Type Conv
2/224: Converting Node Type BatchNormalization
3/224: Converting Node Type Relu
4/224: Converting Node Type Conv
5/224: Converting Node Type Concat
6/224: Converting Node Type BatchNormalization
7/224: Converting Node Type Relu
8/224: Converting Node Type Conv
9/224: Converting Node Type Concat
10/224: Converting Node Type BatchNormalization
11/224: Converting Node Type Relu
12/224: Converting Node Type Conv
13/224: Converting Node Type Concat
14/224: Converting Node Type BatchNormalization
15/224: Converting Node Type Relu
16/224: Converting Node Type Conv
17/224: Converting Node Type Concat
18/224: Converting Node Type BatchNormalization
19/224: Converting Node Type Relu
20/224: Converting Node Type Conv
21/224: Converting Node Type MaxPool
22/224: Converting Node Type BatchNormalization
23/224: Converting Node Type Relu
24/224: Converting Node Type Conv
25/224: Converting Node Type Concat
26/224: Converting Node Type BatchNormalization
27/224: Converting Node Type Relu
28/224: Converting Node Type Conv
29/224: Converting Node Type Concat
30/224: Converting Node Type BatchNormalization
31/224: Converting Node Type Relu
32/224: Converting Node Type Conv
33/224: Converting Node Type Concat
34/224: Converting Node Type BatchNormalization
35/224: Converting Node Type Relu
36/224: Converting Node Type Conv
37/224: Converting Node Type Concat
38/224: Converting Node Type BatchNormalization
39/224: Converting Node Type Relu
40/224: Converting Node Type Conv
41/224: Converting Node Type MaxPool
42/224: Converting Node Type BatchNormalization
43/224: Converting Node Type Relu
44/224: Converting Node Type Conv
45/224: Converting Node Type Concat
46/224: Converting Node Type BatchNormalization
47/224: Converting Node Type Relu
48/224: Converting Node Type Conv
49/224: Converting Node Type Concat
50/224: Converting Node Type BatchNormalization
51/224: Converting Node Type Relu
52/224: Converting Node Type Conv
53/224: Converting Node Type Concat
54/224: Converting Node Type BatchNormalization
55/224: Converting Node Type Relu
56/224: Converting Node Type Conv
57/224: Converting Node Type Concat
58/224: Converting Node Type BatchNormalization
59/224: Converting Node Type Relu
60/224: Converting Node Type Conv
61/224: Converting Node Type MaxPool
62/224: Converting Node Type BatchNormalization
63/224: Converting Node Type Relu
64/224: Converting Node Type Conv
65/224: Converting Node Type Concat
66/224: Converting Node Type BatchNormalization
67/224: Converting Node Type Relu
68/224: Converting Node Type Conv
69/224: Converting Node Type Concat
70/224: Converting Node Type BatchNormalization
71/224: Converting Node Type Relu
72/224: Converting Node Type Conv
73/224: Converting Node Type Concat
74/224: Converting Node Type BatchNormalization
75/224: Converting Node Type Relu
76/224: Converting Node Type Conv
77/224: Converting Node Type Concat
78/224: Converting Node Type BatchNormalization
79/224: Converting Node Type Relu
80/224: Converting Node Type Conv
81/224: Converting Node Type Concat
82/224: Converting Node Type BatchNormalization
83/224: Converting Node Type Relu
84/224: Converting Node Type Conv
85/224: Converting Node Type MaxPool
86/224: Converting Node Type BatchNormalization
87/224: Converting Node Type Relu
88/224: Converting Node Type Conv
89/224: Converting Node Type Concat
90/224: Converting Node Type BatchNormalization
91/224: Converting Node Type Relu
92/224: Converting Node Type Conv
93/224: Converting Node Type Concat
94/224: Converting Node Type BatchNormalization
95/224: Converting Node Type Relu
96/224: Converting Node Type Conv
97/224: Converting Node Type Concat
98/224: Converting Node Type BatchNormalization
99/224: Converting Node Type Relu
100/224: Converting Node Type Conv
101/224: Converting Node Type Concat
102/224: Converting Node Type BatchNormalization
103/224: Converting Node Type Relu
104/224: Converting Node Type Conv
105/224: Converting Node Type Concat
106/224: Converting Node Type BatchNormalization
107/224: Converting Node Type Relu
108/224: Converting Node Type Conv
109/224: Converting Node Type MaxPool
110/224: Converting Node Type BatchNormalization
111/224: Converting Node Type Relu
112/224: Converting Node Type Conv
113/224: Converting Node Type Concat
114/224: Converting Node Type ConvTranspose
115/224: Converting Node Type Add
116/224: Converting Node Type Slice

On GitHub it was recommended that I “generate two concatenated slicing operations, one working on a dimension at a time” but I’m not sure where (or how, honestly) to do that. Any advice greatly appreciated!