One Hundred Layers Tiramisu

aifish · April 2, 2017, 7:14am

Great! Here are the same codes using the trick you showed:

from keras.utils.visualize_util import model_to_dot
from IPython.lib.display import FileLink
def visualizeModel (input_model, path, filename='model.svg'):
    model_dot = model_to_dot(input_model, show_shapes=True, show_layer_names=True)
    model_dot.write(path+filename, prog='dot',format='svg')
    return FileLink(path+filename)

visualizeModel (model, path, 'model.svg')

jeremy · April 2, 2017, 8:30am

I trained my model for 60 epochs before the results you see there. I used SGD(0.1,0.9,nesterov=True)

aifish · April 2, 2017, 2:16pm

Oh I see, need to train it for a lot longer time. May I know how you decide when to use SGD with momentum, when to use RMSprop, and when to use Adam?

jeremy · April 2, 2017, 7:46pm

I just used the params they used in the paper

brendan · April 4, 2017, 2:06am

Finally eating Tiramisu

Let’s see how it trains. The author mentions 750 epochs in their github, with 150 patience. At least we’ve got something
https://github.com/bfortuner/pytorch_tiramisu/blob/master/tiramisu-pytorch.ipynb

brendan · April 4, 2017, 4:53pm

Amazingly, I think it’s still going down? Does the jaggedness suggest I should bring down the learning rate? I’m using vanilla RMSProp, which I thought had its own logic for tuning learning rates… The authors mention they used RMPProp with .001 learning rate with .995 exponential decay. Unlike other frameworks, PyTorch RMSProp doesn’t mention lr decay as a parameter:
http://pytorch.org/docs/optim.html#torch.optim.RMSprop

The model is FC-DenseNet67. The ‘test’ set is the 103 image validation set. My lowest error on the validation set is around 8.5%. My error on the test set is ~15.8%. Strange. In the paper, the authors mention “global accuracy” of 90.8 for FC-DenseNet67 on Camvid. They also cite IoU, but I haven’t implemented that yet.

According to the authors, the next step is to fine-tune the model on full-sized images.

Target Annotation

Predicted Annotation

The predictions on the test set aren’t bad, but they’re not matching the author’s results. Even though my validation error < 10, the test error of 15 is probably the cause.

Let me see what happens when I finetune…

Current Questions/Things To Explore:

Implement IoU metric
Add patience function to early stop training after 150 epochs with no improvement
“For batch normalization, we use current batch statistics at training, validation and test time” Do I need to implement this?
RMSProp with .995 exponential decay. How is this handled in PyTorch?
In Pytorch, it’s not enough to save weights if you want to resume training later. It looks like we also need to save the optimizer state as well. I made this mistake and my “pretrained” weights performed poorly when I resumed training.
Can we try Upsampling2d instead of ConvTranspose2d?
Is RMSProp the best optimizer for segmentation?
What do the authors mean when they say it would be interesting to “pretrain” FC-DenseNet? Change the model’s final layer for handle classification and run it through imagenet?
Preloaded class weights - According to this people often adjust their class weights (how much each class contributes to the total error/loss) to account for the image imbalance between different classes. Some classes are underrepresented in CamVid. I found a copy of these weights and I’m using them. It’s useful to confirm if I’m doing the right thing.
I’m using Negative Log Likelihood since PyTorch had a nice 2D version. The authors use Cross Entropy loss, but I believe in this case they end up being the same thing? Need to confirm.

nn.NLLLoss2d(weight=camvid.class_weight.cuda()).cuda()

Pascal VOC - I looks like the authors tried training on this dataset but were unable to achieve convergence? We made a first attempt on PascalVOC and it did not converge until we use Adam optimizer but it ended with poor results. That doesn’t sound promising? Do you think they were just in a rush to publish?
Other users have struggled to reproduce the results in the paper. For example here.

jeremy · April 4, 2017, 8:13pm

Ugh - that’s a worry… Especially since that’s in the original authors’ repo. I guess we should run the lasagne code ourselves.

Pre-training simply means taking a model that was trained on imagenet (which I believe pytorch has) and using those weights in the downwards path. Then (I assume) just train the upward path, and then finetune on the whole lot.

brendan · April 4, 2017, 8:36pm

I think @kelvin did. Not sure if acc reported is test or val set however
https://github.com/bfortuner/pytorch_tiramisu/blob/master/benchmarks/tiramisu.log

jeremy · April 4, 2017, 10:03pm

@kelvin that’s great that you included the log in github - thanks! Is this DenseNet103? I assume the first few columns are train, and last few are test? If so, then the best test result is:

loss = 0.24795 | jacc = 0.77115 | acc = 0.9440

But table 3 in the paper shows best IoU of 66.9 and accuracy of 91.5. Are these measuring different things? Are you able to match the paper’s results?

kelvin · April 4, 2017, 10:23pm

Yes it’s DenseNet103. The column sets are train and val (not test).

I can re-run it and print out test periodically if that would be helpful.

jeremy · April 4, 2017, 10:25pm

You would only need to run the best model (assuming that you saved the weights) - no need to re-train.

Yes, it would be great to confirm that the lasagne code can replicate the paper’s findings!

brendan · April 4, 2017, 10:56pm

So I did about 100 epochs of fine-tuning on full-sized images.

Epoch, Loss, Error for the Validation Set towards the end of fine-tuning:
762,0.21971764021060047,7.808109411764708
763,0.21415826298442542,7.631791764705883
764,0.2103811400193794,7.758129803921568
765,0.22203320849175548,7.704945882352942
766, 0.209, 7.48

Test Set Results
Loss: 0.4357, Error: 13.2225, Accuracy: 86.77

So fine-tuning reduced error by -2.6%, but we’re still above the authors reported 9.2% error.

Some thoughts:

I manually stopped both the initial training stage and fine-tuning stages early (I was afraid to overfit, but mostly impatient). I think my accuracy could improve with longer training. I was just eager to test the images.
I didn’t do the random horizontal flip augmentation during the fine-tuning stage. The authors don’t mention whether they continued this technique during finetuning (they also say they used vertical flips, but I thought I saw in their code they used horizontal flips)
I think we’re on the right track and what the authors claim is plausible. I’ll just have to do some more experimenting. Next step is to refactor my code to handle the FCDenseNet103.

Image:

Target

Predicted

Image

Target

Prediction

Example from the Author’s paper

Our result for the same image (Target, then Prediction)

kelvin · April 4, 2017, 11:23pm

This is where I last trained the model (well before the 750 epochs):
Epoch 0 took 81+25 sec. loss = 0.79147 | jacc = 0.44204 | acc = 0.77681 || loss = 0.47174 | jacc = 0.57472 | acc = 0.87108 (BEST)
Average cost test = 0.65018 | jacc test = 0.47502 | acc_test = 0.80233

I’ll run it for a bit longer to see where it gets to.

kelvin · April 5, 2017, 12:14am

Here’s a current snapshot

poch 0 took 82+25 sec. loss = 0.75248 | jacc = 0.47201 | acc = 0.79553 || loss = 0.63229 | jacc = 0.52208 | acc = 0.81888 (BEST)
Epoch 0 : [test : 100%]Average cost test = 0.87415 | jacc test = 0.41467 | acc_test = 0.76544 
poch 1 took 83+25 sec. loss = 0.51775 | jacc = 0.56866 | acc = 0.85952 || loss = 0.75316 | jacc = 0.51155 | acc = 0.75854
poch 2 took 87+27 sec. loss = 0.45953 | jacc = 0.61521 | acc = 0.88111 || loss = 0.43686 | jacc = 0.59526 | acc = 0.88277 (BEST)
Epoch 2 : [test : 100%]Average cost test = 0.57786 | jacc test = 0.52581 | acc_test = 0.84360 
poch 3 took 84+27 sec. loss = 0.41342 | jacc = 0.63034 | acc = 0.89817 || loss = 0.48276 | jacc = 0.59196 | acc = 0.87180
poch 4 took 84+27 sec. loss = 0.40582 | jacc = 0.65231 | acc = 0.90079 || loss = 0.35644 | jacc = 0.66893 | acc = 0.91684 (BEST)
Epoch 4 : [test : 100%]Average cost test = 0.57283 | jacc test = 0.54999 | acc_test = 0.85695 
poch 5 took 84+27 sec. loss = 0.36374 | jacc = 0.68196 | acc = 0.91503 || loss = 0.33620 | jacc = 0.69676 | acc = 0.92339 (BEST)
Epoch 5 : [test : 100%]Average cost test = 0.64616 | jacc test = 0.53554 | acc_test = 0.83135 
poch 6 took 84+27 sec. loss = 0.37066 | jacc = 0.68089 | acc = 0.91587 || loss = 0.40339 | jacc = 0.67144 | acc = 0.90153
poch 7 took 84+27 sec. loss = 0.34489 | jacc = 0.70390 | acc = 0.92323 || loss = 0.45925 | jacc = 0.64149 | acc = 0.88778
poch 8 took 84+27 sec. loss = 0.35062 | jacc = 0.69947 | acc = 0.92295 || loss = 0.38000 | jacc = 0.69408 | acc = 0.91329
poch 9 took 84+27 sec. loss = 0.36848 | jacc = 0.70261 | acc = 0.91982 || loss = 0.39515 | jacc = 0.65911 | acc = 0.90052
poch 10 took 85+27 sec. loss = 0.34513 | jacc = 0.71192 | acc = 0.92491 || loss = 0.62396 | jacc = 0.56636 | acc = 0.83963
poch 11 took 84+27 sec. loss = 0.32650 | jacc = 0.71233 | acc = 0.93219 || loss = 0.64778 | jacc = 0.63972 | acc = 0.84534
poch 12 took 88+27 sec. loss = 0.32518 | jacc = 0.72835 | acc = 0.93249 || loss = 0.36002 | jacc = 0.71821 | acc = 0.92149 (BEST)
Epoch 12 : [test : 100%]Average cost test = 0.58999 | jacc test = 0.57274 | acc_test = 0.85886

thunderingtyphoons · April 6, 2017, 5:04am

Does anyone know what is the state of the art for unsupervised segmentation? Semantic segmentation is not required (Just foreground/background classification is sufficient). Or any easy to implement methods?

jeremy · April 7, 2017, 6:02pm

Can you describe more what you mean by ‘unsupervised’? What’s the application? Are you able to create any labels at all?

thunderingtyphoons · April 7, 2017, 6:21pm

There are no labels for the objects at all, but we would like to generate patches for the objects, so that we can identify exactly what is an object and where it is in the picture. So, the notion of what is an object is fuzzy – but in general, it is something which is distinct from the background.

jeremy · April 7, 2017, 6:26pm

Can you label a few, or a few dozen? Can you have someone segment the background for a few images?

What kinds of pictures - regular color photos? What kinds of objects - regular imagenet-style objects?

jeremy · April 8, 2017, 2:59pm

Looks like you weren’t able to get over ~0.85 test accuracy Kelvin? Did you manage to improve on that at all?

@brendan what’s the best test accuracy you’ve got?

brendan · April 8, 2017, 3:52pm

For the FCDenseNet67, my best accuracy was 87.6%on the test set (I used the validation set during training) after about 600 train epochs and 600 fine-tune epochs. But I cut training short.

I’m retraining with FCDenseNet-103, this time using the learning rate decay and early stopping with max patience. I just completed 874 epochs of training on this new net and achieved accuracy of 86.6%. (But max patience was not triggered so I could have continued training).

I’m about to start fine-tuning on the full-sized images (including random horizontal flips) with max patience of 50. I’ll send you the test scores later today. Or you can follow my training at my cool website

One Hundred Layers Tiramisu

Image Target Prediction

Image

Target

Prediction