Statefarm kaggle comp

Hi Jeremy, I am trying to run kaggle statefarm full dataset on the model you used for the sample. It is taking 360 seconds per epoch on p2 instance. Your ipython notebook shows only 114 seconds. What do you think I am doing wrong? I am using the exact same code from ipython notebook (but not the AMI provided with this course, could this be the problem)?

Here is my code (for creating train and val split) and the model I used to train: https://github.com/singlasahil14/kaggle-statefarm

Hi Sarno, I am trying to finetune the vgg model without any data augmentation. The best accuracy I could get to is 68.94%, while in jeremy’s ipython notebook, the accuracy was like 79%. What do you think I am doing wrong?
Here is my code for creating training and validation split, and code for training the model (copied mostly from ipython notebook) https://github.com/singlasahil14/kaggle-statefarm

Reposting from ‘Lesson 3 discussion’ as this seems a more appropriate place.

Hi,
I was thinking about statefarm problem and had some thoughts and questions around it.

a. Statefarm problem seems to have lesser variety in the type of objects that are present in images i.e. each image would have a human and some objects within a car. Given VGG16 has been architectured as well as trained to detect a much wider set of objects, it feels like an overkill to use all convolution layers of VGG16 as is for statefarm problem. Any comments/insights regarding this ?

b. Statefarm image categorization would be determined majorly by the relative position of the objects w.r.t each other i.e. ‘hands-on-steering-wheel’ implies undistracted driver VS ‘hands-on-something-else’ implies distracted driver. How can the fact that ‘actual objects matter less Vs relative position of objects matter more’ be factored in to architect the model ? Any insights/guidelines on this ?

c. Unrelated to statefarm, what are relative tradeoffs of using a larger convolution kernel(55 instead of 33 for e.g.) ?

Thanks,
Ajith

1 Like

I’ve been using a finetuned version of VGG19 for StateFarm as a starting point to build up better models. I ran the test function(same as in vgg16.py) and it seems to be running for quite a while now. I understand that this is 79,000 images and it would take a bit but its’ been on for close to 40 minutes now.

Anyone facing similar issues? Thanks!

Hi there,

so did you managed to get the weights?

are they here http://www.platform.ai/models/

thanks!

Ajith,

I had thought about the same questions, here are my findings:

TLDR; winning solution uses VGG16 and manually crops certain parts of the image for the CNNs to focus on.

  1. Statefarm dataset is relatively small, and reusing the convolution layers from VGG would ideally help the model avoid overfitting.

Visualizing what VGG+keras is looking for:

excellent visuals (zeiler / fergus style):

  1. Its quite clear that certain parts of the image are more important than others, and the leading results certainly had many creative methods to think about this.

This discussion showed a competitor’s method to display with a very cool heatmap of what the CNNs were focusing on:

1st place solution uses VGG16 and combines 3 images; a cropped image around the head, a cropped image around where the hand may be, and the original image itself.

10th place solution involved finding the area around the drivers body, and cropping the image to around that.



Statefarm was a super cool competition to work on, and I feel like it was a quantum leap from dogs-and-cats competition in terms of deep learning understanding. Hope my findings were helpful, the fact that the winning solution used something more elegant in the spirit of deep learning was certainly very encouraging!

best,

Jerry

7 Likes

Hi Jerry,
Thanks for this reply. This is very informative.

Regards,
Ajith

Lesson 7 shows how to do this. :slight_smile:

Note also that the winning method uses k nearest neighbors. I haven’t tried it yet but I suspect this is a critical piece of the solution.

My best State Farm Score is 0.715.

Would like to share my model and how I trained it.

Maybe someone will find it useful.

In general I used conv layers of VGG16 trained model as input to a new untrained simple bn/dense/dropout layers.
It was trained in 3 steps and after each steps I submitted my test prediction at Kaggle.

  1. created an augmented batch and predict it on VGG16 conv layer. output prediction is the training set for my untrained model. (spent A LOT of time realizing the augmented batch must not be generated with shuffle=True or else labels will not be correct)
  2. predict and trained with a 5x trained data.
  3. predict test images and used it for pseudo labeling. over fitting wise, this was the best train.

Also two main question/thoughts i have after this training

  1. Big difference between my validation loss and the actual computed competition loss.
    In my training I reached val loss of 0.3778 - which i can dream about that score on State Farm test set.
    I believe my validation set was build OK, and I separated different drivers for training and for validation.
    I wonder what is a the way to close this gap between my val loos and the test loos.

  2. During training i was also over fitting a lot (except when using pseudo labeling).
    MY question is, if Over fitting, is it useful to continue training a model that its train accuracy is very high, say 0.99 and up?
    Is it possible to improve val loss/acc when model train acc is so high?

Trouble with training final_model (combined bn_model and conv_model) on StateFarm dataset

I am having trouble with the training of the final_model (which combines the Vgg convolution layers (fixed at original weights)) with the new fully connected layers (including Dropout(p=0.6) and Batchnorm layers).

My bn_model trains quickly to a validation accuracy of ~80% (training accuracy of ~99%). However, when I load these bn_model weights into the final_model in order to train it further, the final_model performance worsens during training. In the first epoch training accuracy drops to ~10% with validation accuracy dropping to 53% in first epoch down to ~20% in the fourth epoch (training accuracy continuing to hover around 10%).

These are the steps I followed:

  1. Created a ‘static’ dataset (using get_batches with shuffle = false)
  2. Split off the convolution layers from the Vgg16 model and compile with the RMSprop optimizer with lr=.0001. Set the layers to .trainable=False.
  3. Constructed the bn_model (with the layers as per the Lesson 3 notebook i.e. 3 dense layers, 2 dropout layers, 2 batchnorm layers) and compiled with RMSprop optimizer with lr=.0001.
  4. Generated the feature inputs (training and validation) for the bn_model with conv_model.predict.
  5. Trained the bn_model. Within one epoch the training accuracy went to 88.6% with the validation accuracy at 79%.
  6. Added the bn layers to the conv_model to create the final_model. Loaded the bn_weights for the bn layers of the final_model. Compiled with RMSprop optimizer with lr = .0001.
  7. Run the final_model with the train_batches and val_batches created during step 1. The model accuracy starts out at that of the bn_model, but then quickly drops off to a training accuracy of ~10% in the first epoch.

I have done the following steps to try to isolate the problem:

  1. I used .evaluate() on both bn_model and final_model with the data used for training and confirmed that they get the same results. This hopefully rules out any discrepancy between the feature data created with the conv_model for the bn_model and the ‘raw’ image data used for the final_model training.
  2. I compared the weights of the convolution layers of the final_model and conv_model before and after training to confirm the that weights were originally properly loaded and that the .trainable=False ensured that these layer weights are not adjusted form the original Vgg16 weights.
  3. I did the same comparison of the fully connected layers of the final_model with the layers of the bn_model to confirm that the weights were the same prior to training the final_model.
  4. I also did a further round of training on the bn_model to ensure that it did not show the same behaviour as the final_model. With further training the bn_model continued to improve the training accuracy (with the validation accuracy oscillating between ~60% and 80%.

I’m having no luck uploading screenshots from my model, but hopefully the description above is clear enough. I will again try later.

Any ideas of what might be causing this to happen?
Any pointers of further trouble shooting I can do?

Many thanks
Rauten

A model.summary() of your model would be helpful as well a Gist link to your code.
Also read about model.built which can be set to True/False - maybe it will help.
Not sure, but make sure that after compiling the combined model the layers that need to be non-trainable are indeed non trainable.

BTW, why in the first place , you are looking to combine the models? predict with conv_model and use it as an input for bn_model…

RE: Trouble with training final_model on StateFarm dataset
@idano
Thanks for your reply.

Herewith the screenshots that I couldn’t get to load yesterday.

  1. Screen shot of bn_model training

'2. After training the bn_model, I copied the weights of the fc layers of the bn_model to the final_model and then trained the final_model on the same data. Here is a screen shot of the training outputs:

'3. This is how I loaded the batches for training and generated the training data for the bn_model.

I am not familiar with Gist, but attempted to create a Gist with my notebook. Please let me know if this worked:

Thanks
Rauten

@idano
Here is a summary of my final model:



@jeremy - I can’t the part of the lecture, notes, or notebook that talk about how to combine batches together. Can you direct them to me? Thanks

There’s a get_data function in utils.py nowadays…

1 Like

hrr… not sure how the get_data function can solve the MemoryError that arises when concatenating augmented data with the original data. After more digging, seems like other people are saving their data with bcolz and then generating through the two batches like this:

X = bcolz.open(path + 'train_convlayer_features.bc', mode='r')
y = bcolz.open(path + 'train_labels.bc', mode='r')
trn_batches = BcolzArrayIterator(X, y, batch_size=X.chunklen * batch_size, shuffle=True)

model.fit_generator(generator=trn_batches, samples_per_epoch=trn_batches.N, nb_epoch=1)
3 Likes

I tried the opposite of pseudolabelling, I kicked out all training data that ‘confused’ the network. I thought that means pictures that are a) mislabelled or b) weird in another way (for example: a picture of a guy labelled as ‘driving normally’ who has his hand next to his head to adjust his glasses looks a lot like he’s on the phone).

I predicted classes for all training pictures and removed those where the highest class ‘probability’ was below 0.9, that removes about 8% of training pictures. I then trained the same model from scratch after restarting the kernel.

I get a slighly better loss on the test set:

  • Before removing pictures:
    Private score: 0.94359
    Public score: 1.18213

  • After removing pictures:
    Private score: 0.92506
    Public score: 1.11814

Is this difference small enough to be due to random initialization of weights? Is it normal to test your training set to remove noise?

I put the notebook here: https://github.com/philippbayer/cats_dogs_redux/blob/master/Statefarm.ipynb

1 Like

@Philipp - I like the way you’re thinking about this. The good news is that there’s a terrific paper that talks about how to deal with just that issue: http://arxiv.org/abs/1412.6596 . The ‘soft loss’ they discuss is similar to your idea - but it’s done in a more dynamic way during training.

1 Like

Does the dense layer output dimension affects the result?

Hi guys, I’m working on statefarm using the vgg16 conv_model to precompute the conv_feature, and then define the bn_model just like jeremy’s noteboook.

When I first run the cell, the result is terrible, the val acc is always less then 9%, but when I change the dense layer output dimension form 128 to 100, the val acc suddenly pop to 14%, does the dense layer output dimension affcects the result? If yes, how it affects the result?

Btw, I used the same architecture, with p=0.5, dense(100), lr=1e-6, after running 72 epoch, I got this follow result:


How’s that look like? I think it’s overfitting right? I’m working on the data augmentation, let me see if it can help.

Edit1:
I used the data augmentation, but I can only add the aug data that 3 times larger than training set. @jeremy said not to concatenate the aug data to the conv_feature directly , otherwise it will raise memory error, and he’ll introduce new method to concatenate the data. But I can’t find this method…

Anyway, after I tripled the data, I set the dropout param to 0.8 and run 4 ecophs, I got this result:

Edit2:
I submit the result to kaggle, what ??? Kaggle loss is different from my val_loss ? Any suggestions ?

I have been running the statefarm-sample.ipynb where a model is build up using 1500 sample images from the main training set. The model by the end of the notebook is hitting 50% accuracy when it was run by the previous user - that is I can see the old results in the notebook before I run it again on my copy of the data. When I run the notebook I get very high accuracy - for example where jeremy get 50% I get much higher

1500/1500 [==============================] - 26s - loss: 1.2677 - acc: 0.9307 - val_loss: 1.4245 - val_acc: 0.8360

With this block:

model = Sequential([
        BatchNormalization(axis=1, input_shape=(3,224,224)),
        Flatten(),
        Dense(100, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, batches.nb_sample, nb_epoch=2, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)
model.optimizer.lr = 0.01
model.fit_generator(batches, batches.nb_sample, nb_epoch=5, validation_data=val_batches, 
                 nb_val_samples=val_batches.nb_sample)

I get:
1500/1500 [==============================] - 25s - loss: 0.1288 - acc: 0.9940 - val_loss: 0.4701 - val_acc: 0.9050

much higher than expected so something’s wrong these results are too good. How is this possible?

1 Like