Lesson 1 discussion

Here is Andrew Ng’s talk on this exact question which I recommend checking out. Although, it might be more relevant after you go over lesson 2 and 3 where we talk about other techniques like Regularization, Data augmentation, etc.
Here is my summary of the workflow:


I put question marks for the ones which were mentioned in the talk but for which I am not sure of yet.
So for now (lesson 1), we are mostly only dealing with “Bias”, so we can try steps 1.Bigger model, 2. Train longer (more epochs)

Nope. They would not change.

I think the results would slightly change as the initial weights would be different (random) @jeremy may correct.

This is an interesting question, and I do not believe I have reached an optimum yet and tried until I think 20- 30 epochs. I am also interested in a general form of the question “How do we know running more epochs would not help? Because I did see results going up and down, so the gradient is not a sure sign.”

5 Likes

Thank you for the response!

Hi everyone,

I’m following this course as part of the MOOC. I was able to build a vgg16 model from scratch and finetuned it without the batch normalization layer. When I run the evaluator on the validation set, I get a categorical_crossentropy of 0.06 and an accuracy of 98.2% after training for 2 eepochs as shown in the screenshot below.

But, when I submit my results to kaggle, loss is very poor as my public_score is something like 11.42062. I generated my submission file very similar to the sample_submission file. Once the model is created, I perform the following steps to generate the test predictions. I strongly believe I’m doing something wrong here that is causing me to submit wrong answers.

batch_size=125
class_mode=None
img_size_expected=(244,244)
test_batches = get_batches('test1',class_mode=class_mode,target_size=img_size_expected,
                            batch_size=batch_size,shuffle=False)

def predict_labels(model,test_batches,batch_size):
    final_preds = []
    for i in range(0,test_batches.nb_sample/batch_size,1):
        imgs = next(test_batches)
        preds = model.predict(imgs,batch_size=batch_size, verbose=1)
        final_preds = final_preds+preds.tolist()
    return final_preds

final_preds = predict_labels(model=vgg16_model,test_batches=test_batches,batch_size=batch_size)

test_predictions = np.asarray(final_preds)
test_labels = test_predictions[:,1]
test_labels_list = test_labels.tolist()
test_image_ids=test_batches.filenames    

test_id_labels = []
for index,(img_id,img_label) in enumerate(zip(test_image_ids,test_labels_list)):
    test_id_labels = test_id_labels + [[int(img_id.split('/')[1].split('.')[0]),img_label]]

test_id_labels_array = np.array(test_id_labels)
test_id_labels_array_sorted = test_id_labels_array[test_id_labels_array[:,0].argsort()]
test_id_labels_sorted = test_id_labels_array_sorted.tolist()

test_id_labels_sorted_rightFormat = []
for item in test_id_labels_sorted:
    test_id_labels_sorted_rightFormat = test_id_labels_sorted_rightFormat + [(int(item[0]),round(item[1],1))]

Once I save the above list into a csv format, I get a submission file as shown below.

I’ve verified code and am stuck. Can someone help me in getting the submission file in the right format for Kaggle?

Best Regards,
Guru Medasani

1 Like

Watch the video where @jeremy explains why you get punished for using 1’s and 0’s (100% probability). He capped the results to .98 and 0.05 or something

2 Likes

Thanks @mclasson. I saw this np.clip() on the probabilities in the second lesson after you comment.

They will be used in later lessons.

I see, thanks, Jeremy. updating the libraries solved the problem.

Im curious about what the test1 folder data is for. I’m assuming its for further validation. Can please share how one would use it and what its for?

@jeremy and @mclasson - Thanks for the clipping tip. I was able to apply clipping and get into top 12% on the competition. I’m going to keep trying to see if my model can get any better with data augmentation and. So far this is the best Deep Learning MOOC out there.

That’s where the kaggle competition data for submission is.

Hello Everybody,

First of all, I would like to thank Rachel and Jeremy for creating this fantastic course and this knowledge hub.

I am trying to improve my performance in the Dogs vs. Cats problem. For that, I thought It would be a good idea to, not only training the last layer, but training a bigger part of the network.

I have tried several approaches:

  • Train the whole network (CONV + FC)
  • Train only the fully connected block (FC)
  • Train the last 4096 FC layer

and the more layers I make trainable, the worse is the error (both, train and validation) the net achieves. With the approaches above, I get a validation accuracy of 0.5, 0.5 and 0.96 respectively. If I only train the last layer (as Jeremy does in the course) I get 0.986. My idea behind it is that the model would fine-tune the weights of the original VGG yielding a better solution, but this logic seems to be wrong and I would like to know the reason behind it…

Thank you in advance,
Iván

Hello Mr. Howard, Ms. Thomas, and fast.ai members!

Thank you for setting up this wonderful community to help us learn about Deep Learning!

I noticed that http://www.platform.ai/files/ is no longer available and I cannot find [dogscats.zip] on https://github.com/fastai/courses.

Where I can I download the [dogscats.zip] data set now that the page is no longer available?

Thanks!

you can find it on the lesson 1 page! at the bottom of ‘necessary files’ http://wiki.fast.ai/index.php/Lesson_1_Notes

Wow, didn’t expect such a quick reply!

ecase (Elizabeth), you’re a life saver! Thank you very much, :smiley:

@james_goldfarb I have the same problem as @martin, I would like to not have to recompute the model for the cats and dogs redux data, but instead use the model already computed in the lesson1 notebook.

I tried
vgg = Vgg16()
vgg.load_model(path+‘Data/dogscats/models/dogscats.h5’)
it says “Vgg16 instance has no attribute ‘load_model’”

I also tried
vgg=model.load_model(path+‘Data/dogscats/models/dogscats.h5’)
it says “NameError: name ‘model’ is not defined”

I also tried
vgg=load_model(path+‘Data/dogscats/models/dogscats.h5’)
it says “NameError: name ‘load_model’ is not defined”

@martin Did you solve the problem since?

@Ptilulu You need to understand a few concepts to do this.

If you look in vgg16.py you can search for model and weights.
“model” is your convolutional neural network.

The Model is defined:

model = self.model = Sequential()
model.add(Lambda(vgg_preprocess, input_shape=(3,224,224)))

    self.ConvBlock(2, 64)
    self.ConvBlock(2, 128)
    self.ConvBlock(3, 256)
    self.ConvBlock(3, 512)
    self.ConvBlock(3, 512).........................

Next step is to train the model. This yields the model “weights”. Jeremy is using a pretrained model. The weights are saved in fname = ‘vgg16.h5’. We do not see this, but at some point in the past someone saved these weights using model.save_weights(‘vgg16.h5’).

He then loads the weights with
model.load_weights(get_file(fname, self.FILE_PATH+fname, cache_subdir=‘models’))

You can do the same in lesson1.py.

You can train your model with

vgg.fit(batches, val_batches, nb_epoch=1)

and then save the weight with

 vgg.save_weights('myfile.h5')

If you want to train further at another time, use
vgg = Vgg16()
vgg.load_weights(‘myfile.h5’)
vgg.fit(batches, val_batches, nb_epoch=1)


In practice, this is used all the time.
We train the model, save the weights and then to use the model (which is the purpose of all this work).
We load the weights and then model.predict() to actually use the model and predict whether an image is a cat or dog.

During model training it is also very useful to save the weights (and the loss as well as other metrics ) for each epoch, since overfitting (lower accuracy) can occur with too much training.


model.save(fname) saves more than the CNN. See the keras faq.
https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model

start with model.save_weights and model.load_weights.
Use can be seen in vgg16.py.

For datasets that are less similar to imagenet (i.e. the dataset used for the pretrained weights) you’ll need to retrain more layers. Explaining and understanding this is the basis of much of this course. Hopefully by the time you’re done with lesson 7 you’ll have a good understanding of this issue - but trying to explain it now in a brief forum reply may be getting ahead of ourselves! :wink:

3 Likes

@ivallesp Imagenet has a much bigger dataset from which convolution layers have extracted enough structure and spatial information. So you can leave that part, @jeremy’s lectures do cover training the fully connected block and was able to achieve higher accuracies. This is useful as it helps you to try different dropout values in the fully connected block, learning rate and optimizers, etc.

Here is a comment from Andrej Karpathy on similar topic of using convnets in practice.

In practice: use whatever works best on ImageNet. If you’re feeling a bit of a fatigue in thinking about the architectural decisions, you’ll be pleased to know that in 90% or more of applications you should not have to worry about these. I like to summarize this point as “don’t be a hero”: Instead of rolling your own architecture for a problem, you should look at whatever architecture currently works best on ImageNet, download a pretrained model and finetune it on your data. You should rarely ever have to train a ConvNet from scratch or design one from scratch. I also made this point at the Deep Learning school.

Here is the link to the above comment. http://cs231n.github.io/convolutional-networks/

Hope this helps.

Best Regards,
Guru Medasani

3 Likes

Why is the 3rd element in the tuple produced by vgg.predict(imgs, True) indexed by the FULL set of categories instead of the constrained set of simply dogs and cats?

Is there a way to change this?

@wgpubs You can execute vgg.classes = ["cat", "dog"] to make the class names correspond to this dataset.

3 Likes