Lesson 7 discussion

I’m pretty sure there were some negatives in the bounding box data. You can double check easily enough…

Very cool lesson.
Some problem that I went through (in the Larger Size section) was the notion of predicting the training data on the pre saved weight of vgg16BN to speed it up.
The issue that I had is that the function get_data used on the training set trn = get_data(path+'train', (360,640)) will not fit in memory for a big training set.
I resolved this by keeping the batches and using predict_generator.
However, the same issue arise when you use predict_generator on a training data set too big.
I went around it by just setting the layers non trainable and adding the extra layers of get_lrg_layers() as trainable.
However, this takes a long time since even if the layers are non trainable it still take a while to go through them.
Would anyone know a workaround it?

In part 2 of the course you’ll learn how to use BcolzArrayIterator to handle this. In hindsight, we should have taught that in part 1!

3 Likes

Hi @james_goldfarb, in the notebook, only at last layer, the network starts to branch out to predict 2 things(bb_coordinateds and probabilities). Since batchnorm is used for every dense layer , so it makes sense to use it here as well.

and regarding activations, I guess we can use relu activations if we don’t want negative values in our predictions

Hi all, I was wondering if there is any way to build multi-input fully convolutional network.
One idea I have is to concatenate duplicated filter to the output of a convolutional network. For example, the output of a convolutional layer is (number_of_filters x height x width). In the fisheries competition we had to use image sizes and lets suppose there are 10 unique image sizes, so after one-hot encoding each image_size feature will be of length 10. Now Duplicate this across (heigth x width), now you have tensor of shape (10 x height x width). Concatenating this to the output of convolutional layer results in a tensor of shape ((number_of_filters + 10) x height x width).

The Downside of the above approach is inefficient computation since there are duplicate filters.Is there any clever way for making use of both fully convolutional network and image sizes ?

Thank You

There seems to be a typo in the Lesson 7 get_cm(imp, label) function. It takes imp as an input, but uses inp in the calculation. Fortunately this does not cause an error because inp is defined before the function is called.

I have successfully precalculated the outputs for resnet50 from the last conv layer. And I could create a separate sequential model for the fully connected layers. However is it possible to create the fully connected layer model directly from resnet50? The splitting technique used for sequential models does not work for this.

At the first 10 mins of the lecture, Jeremy finetuned a resnet model, and changed its input size to 400 by 400.

I’m wondering why did that made a good improvement in the classification accuracy, and how can some one with intuition decides whether or not this would make an improvement or not.

Also i navigated through resnet and found that changing input size, doesn’t change any parameters except for the last dense layer which will be replaced when finetuning.

So I’m wondering if someone needs to do the same approach when finetuning VGG or anyother models that contains many dense layers, which we’ll to freeze its weights.

Thank you :))

When adding the metadata for fisheries the onehotencoded image sizes are then normalised. It is in the wiki and notebook and the notes. Why is this necessary?

1 Like

I imagine it’s because the image pixels themselves are normalized as well. Putting both inputs on a similar of scale as possible will make it as easy as possible to train the model.

It is mentioned in earlier classes that normalising data leads to faster training. However I was surprised to see it applied here to categorical data as that did not seem to make sense…yet from experimenting the normalised data starts out with a much lower loss than if I don’t include the normalisation.

I guess the message is always normalise even categoric data.

Why does the “fully convolutional network” use border_mode=“same”? I do not recall any other model using this.

Padding SAME is not that uncommon. Fox example AlexNet and VGG [1,2] are both using this configuration. The alternative (padding = VALID) means the padding = 0 [3]. It makes the model harder to reason about because the height and width of your filter are going to shrink even with a stride 1. In addition you are going to learn less from what’s happening at the border of the filter.

[1] https://github.com/tensorflow/tensorflow/blob/084d29e67a72e369958c18ae6abfe2752fcddcbf/tensorflow/contrib/slim/python/slim/nets/vgg.py#L70

[3] https://www.tensorflow.org/versions/r0.12/api_docs/python/nn/convolution

Is there any rationale for when to use same or valid? Or is it just trial and error?

I am experiencing the same problem. Code here

After making the same change at the Psedo labels I ran into the error.

ValueError: Error when checking : expected input_3 to have shape (None, 512, 14, 14) but got array with shape (1000, 512, 22, 40)

What is input_3 referring to? It seems like several people are getting this error, so at least I know I’m not alone.

Edit: Input_3 is referring to the model and the input shape from conv_test_feat (1000, 512, 22, 40) is not what it is expecting. Using the lrg_model works and I am moving forward with:

preds = lrg_model.predict(conv_test_feat, batch_size=batch_size*2)

I’ve seen this asked a few times without a good answer, but can someone explain WHY we normalize the one-hot encoded image sizes in the section on incorporating data leakage/meta-data in models?

train_size_labels = (train_size_labels - train_size_labels.mean(axis=0)) / train_size_labels.std(axis=0)
val_size_labels = (val_size_labels - train_size_labels.mean(axis=0)) / train_size_labels.std(axis=0)

As these were one-hot encoded, I was surprised to see them being normalized rather than just used as is. Plus, we apply a BatchNormalization layer in the model as well so it seems redundant.

I am wondering how the resizing of the images to fit the VGG16 model input affect the bounding boxes annotations performed over the original image sizes. Would the model be able to apply the resizing also to the bounding boxes annotations or be able to understand the translation?

Thanks!

My guess is that the it would work just fine without the normalization. Have you tried? It was likely a minor decision Jeremy made on the fly and either way would’ve worked fine.

Hello,

I tried following the instructions to use the resnet50.py and I get an error on the first lien:

from resnet50 import Resnet50
rn0 = Resnet50(include_top=False).model

I get a valueerror about negative dimension:
ValueError: Negative dimension size caused by subtracting 3 from 2 for ‘max_pooling2d_2/MaxPool’ (op: ‘MaxPool’) with input shapes: [?,2,112,64].

Anyone else experienced this?

I’m running TF as backend ( I might as well because Theano is now dead

and i’ve been having trouble running the RESNET example from Lesson 7 (particularly multi-target classifier and transfer learning)

Has anyone been able to do run RESNET example either

  1. on Jeremy’s code (lesson 7 video), but with TF backend?

Problem here: calling

model = Resnet50(include_top=False)

gets this error , which I haven’t been able to google:

ValueError: It seems that you are using the Keras 2 and you are passing both kernel_size and strides as integer positional arguments. For safety reasons, this is disallowed. Pass strides as a keyword argument instead.

Note, I am using using updated resnet50.py file from one of the students github that was made to include Keras2.0 changes


OR


  1. Resnet using Kera’s built-in resnet from keras.applications module?

Problem incurred here:

from keras.models import Sequential
from keras.applications import ResNet50
from keras.applications import imagenet_utils
from keras.applications.inception_v3 import preprocess_input
from keras.utils import get_file
from keras.preprocessing.image import img_to_array, load_img
import numpy as np
import cv2
import json
from utils import *

model = ResNet50(weights="imagenet")
[ model.layers.pop() for _ in range(3) ] # i had to this to mimick Jeremy's resnet model

however, in Jeremy’s model output_shape, it was (2048,7,7)

where as this models’s output_shape is (1000,)

What’s the discrepancy?