Statefarm kaggle comp

garima.agarwal · November 24, 2016, 9:17pm

I ran a couple more epochs… and accuracy got even better.

Train on 20624 samples, validate on 2000 samples
Epoch 1/2
20624/20624 [==============================] - 8s - loss: 0.0156 - acc: 0.9952 - val_loss: 0.0193 - val_acc: 0.9950
Epoch 2/2
20624/20624 [==============================] - 8s - loss: 0.0138 - acc: 0.9963 - val_loss: 0.0083 - val_acc: 0.9970

but my submission got worse down to 414 (score - 0.71507)

jeremy · November 25, 2016, 12:14am

That means you’re up to the really interesting bit of this competition! see this reply for a previous response to this issue.

jeremy · November 25, 2016, 12:16am

You shouldn’t load the data again immediately after saving it - as your note mentions, this is redundant! You only need to use get_data the 1st time, which is slow. Then in the future you can just use load_array. But you never need use both.

garima.agarwal · November 25, 2016, 12:53am

does that mean that validation set should be created by 'move’ing the files from training and not 'copy’ing?
I have heard that argument about different drivers before but i dont exactly understand why its relevant and what i need to do.

And if that is what it is, my data set in valid is indeed different than what is in train.

garima.agarwal · November 25, 2016, 2:02am

nevermind… i figured that even if they are different images they are not segregated by driver…

So the drivers_imgs_list.csv might be the key to be able to ensure that the validation set doesnt have any images from training set…

jeremy · November 25, 2016, 3:44am

Exactly!

(and yes - it’s always critical that you move, not copy, to your validation set)

ethan · November 25, 2016, 4:39am

Perhaps because of cuDNN-Theano compatibility?

ethan · November 25, 2016, 4:40am

Ah okay, that makes sense! Thanks.

garima.agarwal · November 25, 2016, 4:41am

I got the image breakdown sorted and i think getting more realistic values now.

Finally my training set numbers are looking better but validation still sucks

Epoch 18/20
20570/20570 [==============================] - 9s - loss: 0.4897 - acc: 0.8335 - val_loss: 6.4907 - val_acc: 0.1295
Epoch 19/20
20570/20570 [==============================] - 9s - loss: 0.4465 - acc: 0.8464 - val_loss: 6.2933 - val_acc: 0.1130
Epoch 20/20
20570/20570 [==============================] - 9s - loss: 0.4197 - acc: 0.8579 - val_loss: 6.6927 - val_acc: 0.1154

I guess i am overfitting now … i will introduce dropout after this run of 50 epochs and see if it helps.

jeremy · November 25, 2016, 4:44am

Nothing to worry about - you can ignore that warning.

ethan · November 25, 2016, 4:53am

Oh, Garima asked me to “share the result of the cell where you said cuda.use(‘gpu0’)” because I am getting a memory error whenever using “load_array” (not that I need to use it).

garima.agarwal · November 25, 2016, 5:00am

Not making much progress with the validation set.

Reduced LR to 0.0001 and also played with dropout to be 0.3, 0.5, 0.8.

Traing accuracy gradually grows but validation set is stick 10-15 %

Any suggestions?

Data augmentation?

Something weird. When I try to print bn_model.summary I got this error:

Problem occurred during compilation with the command line below:
/usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=46080 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/ubuntu/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/home/ubuntu/anaconda2/include/python2.7 -I/home/ubuntu/anaconda2/lib/python2.7/site-packages/theano/gof -fvisibility=hidden -o /home/ubuntu/.theano/compiledir_Linux-4.4–generic-x86_64-with-debian-stretch-sid-x86_64-2.7.12-64/tmp88Zg7u/8c1d220b216df041ef92ce887e8de31e.so /home/ubuntu/.theano/compiledir_Linux-4.4–generic-x86_64-with-debian-stretch-sid-x86_64-2.7.12-64/tmp88Zg7u/mod.cpp -L/home/ubuntu/anaconda2/lib -lpython2.7
ERROR (theano.gof.cmodule): [Errno 12] Cannot allocate memory

If its cant print the summary, how can it execute the model?

ethan · November 25, 2016, 6:23am

Does anyone know what this error comes from? I searched it up, and apparently it has something to do with the GPU. (http://stackoverflow.com/questions/24402213/theano-test-optimization-failure-due-to-constant-folding-on-ubuntu)

Code:

Error: Basically, it gives me this error, and then runs until the page crashes, and then I have to kill the page.

GPU Command:

The error is an “Optimization failure due to constant_folding”.
Thanks!

jeremy · November 25, 2016, 12:50pm

Looks like you’re out of memory. Maybe you are using load_array() rather than standard batches, and you don’t have enough memory for that?

Perhaps your validation set has a problem? Check visually that the labels correspond to the validation set images correctly.

Try going back over the video where I show my solution to statefarm and see if the simplest decent model I show there works equally well for you. If it doesn’t, then there must be a validation set problem.

jeremy · November 25, 2016, 12:53pm

I haven’t seen that before… Have you tried rebooting your instance? Have you checked that ‘batches’ contains the data you expect it to?

vshets · November 26, 2016, 3:13am

My p2 instance keeps restarting at this point:
test_data = get_data(path)

Unfortunately I do not believe I can use batches here because I am using model.predict_proba() which is expecting the input to be an numpy array and not a generator. Thoughts?

jeremy · November 26, 2016, 3:26am

You can still use batches - use .flow() rather than .flow_from_directory. But you should probably be using predict_generator(), not predict_proba() - just change class_mode=None when you create you batches if you want raw probabilities.

garima.agarwal · November 26, 2016, 5:09am

I changed my model to be my own:

def conv1(batches):
model = Sequential([
BatchNormalization(axis=1, input_shape=(3,224,224)),
Convolution2D(32,3,3, activation=‘relu’),
BatchNormalization(axis=1),
MaxPooling2D(),
Convolution2D(64,3,3, activation=‘relu’),
BatchNormalization(axis=1),
MaxPooling2D(),
Convolution2D(128,3,3, activation=‘relu’),
BatchNormalization(axis=1),
MaxPooling2D(),
Flatten(),
Dense(200, activation=‘relu’),
BatchNormalization(),
Dropout(0.3),
Dense(10, activation=‘softmax’)
])

return model

My results are getting better now but I am still only at 40% for my validation set.
Epoch 1/2
20570/20570 [==============================] - 348s - loss: 1.3970 - acc: 0.5540 - val_loss: 3.6192 - val_acc: 0.1490
Epoch 2/2
20570/20570 [==============================] - 296s - loss: 0.6758 - acc: 0.7859 - val_loss: 2.7510 - val_acc: 0.3082
Epoch 1/5
20570/20570 [==============================] - 302s - loss: 0.4530 - acc: 0.8659 - val_loss: 2.4068 - val_acc: 0.3661
Epoch 2/5
20570/20570 [==============================] - 297s - loss: 0.3547 - acc: 0.8947 - val_loss: 2.6552 - val_acc: 0.3462
Epoch 3/5
20570/20570 [==============================] - 299s - loss: 0.2779 - acc: 0.9200 - val_loss: 2.3945 - val_acc: 0.3496
Epoch 4/5
20570/20570 [==============================] - 299s - loss: 0.2358 - acc: 0.9334 - val_loss: 2.4213 - val_acc: 0.3851
Epoch 5/5
20570/20570 [==============================] - 300s - loss: 0.2149 - acc: 0.9379 - val_loss: 2.3568 - val_acc: 0.4075

Btw, I did use data augementation for the validation set as well.

gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+‘train’, gen_t, batch_size=batch_size)
val_batches = get_batches(path+‘valid’, gen_t, batch_size=batch_size)

After 12 epochs my accuracy seems to be going down.
If i were to go and add another layer to the model or add BN or dropout, i would have to throw away all the weights we have learnt so far … is that right?

garima.agarwal · November 26, 2016, 7:13am

After another hour of training I am up to

Epoch 1/1
20570/20570 [==============================] - 306s - loss: 0.1995 - acc: 0.9378 - val_loss: 1.9990 - val_acc: 0.4971

I dont think this is going to cut it.
I am still training with the Adam as optimizer. Here is gist if anything stands out as wrong.

Appreciate any comments.

Thanks
Garima

jeremy · November 26, 2016, 7:22pm

Looking good! Maybe try Dense(100)? And then find an amount of dropout that you can train for more epochs.

If you’re not going to use pre-trained vgg, you’ll need to use lots of data augmentation to get around the data shortage.