Statefarm kaggle comp


(garima.agarwal) #62

I ran a couple more epochs… and accuracy got even better.

Train on 20624 samples, validate on 2000 samples
Epoch 1/2
20624/20624 [==============================] - 8s - loss: 0.0156 - acc: 0.9952 - val_loss: 0.0193 - val_acc: 0.9950
Epoch 2/2
20624/20624 [==============================] - 8s - loss: 0.0138 - acc: 0.9963 - val_loss: 0.0083 - val_acc: 0.9970

but my submission got worse down to 414 (score - 0.71507)
:frowning:


(Jeremy Howard (Admin)) #63

That means you’re up to the really interesting bit of this competition! :slight_smile: see this reply for a previous response to this issue.


(Jeremy Howard (Admin)) #64

You shouldn’t load the data again immediately after saving it - as your note mentions, this is redundant! You only need to use get_data the 1st time, which is slow. Then in the future you can just use load_array. But you never need use both.


(garima.agarwal) #65

does that mean that validation set should be created by 'move’ing the files from training and not 'copy’ing?
I have heard that argument about different drivers before but i dont exactly understand why its relevant and what i need to do.

And if that is what it is, my data set in valid is indeed different than what is in train.


(garima.agarwal) #66

nevermind… i figured that even if they are different images they are not segregated by driver…

So the drivers_imgs_list.csv might be the key to be able to ensure that the validation set doesnt have any images from training set…


(Jeremy Howard (Admin)) #67

Exactly! :slight_smile:

(and yes - it’s always critical that you move, not copy, to your validation set)


(ethan) #68

Perhaps because of cuDNN-Theano compatibility?


(ethan) #69

Ah okay, that makes sense! Thanks.


(garima.agarwal) #70

I got the image breakdown sorted and i think getting more realistic values now.

Finally my training set numbers are looking better but validation still sucks

Epoch 18/20
20570/20570 [==============================] - 9s - loss: 0.4897 - acc: 0.8335 - val_loss: 6.4907 - val_acc: 0.1295
Epoch 19/20
20570/20570 [==============================] - 9s - loss: 0.4465 - acc: 0.8464 - val_loss: 6.2933 - val_acc: 0.1130
Epoch 20/20
20570/20570 [==============================] - 9s - loss: 0.4197 - acc: 0.8579 - val_loss: 6.6927 - val_acc: 0.1154

I guess i am overfitting now … i will introduce dropout after this run of 50 epochs and see if it helps.


(Jeremy Howard (Admin)) #71

Nothing to worry about - you can ignore that warning.


(ethan) #72

Oh, Garima asked me to “share the result of the cell where you said cuda.use(‘gpu0’)” because I am getting a memory error whenever using “load_array” (not that I need to use it).


(garima.agarwal) #73

Not making much progress with the validation set.

Reduced LR to 0.0001 and also played with dropout to be 0.3, 0.5, 0.8.

Traing accuracy gradually grows but validation set is stick 10-15 %

Any suggestions?

Data augmentation?

Something weird. When I try to print bn_model.summary I got this error:

Problem occurred during compilation with the command line below:
/usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=broadwell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mrtm -mhle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-pcommit -mno-mwaitx --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=46080 -mtune=broadwell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/ubuntu/anaconda2/lib/python2.7/site-packages/numpy/core/include -I/home/ubuntu/anaconda2/include/python2.7 -I/home/ubuntu/anaconda2/lib/python2.7/site-packages/theano/gof -fvisibility=hidden -o /home/ubuntu/.theano/compiledir_Linux-4.4–generic-x86_64-with-debian-stretch-sid-x86_64-2.7.12-64/tmp88Zg7u/8c1d220b216df041ef92ce887e8de31e.so /home/ubuntu/.theano/compiledir_Linux-4.4–generic-x86_64-with-debian-stretch-sid-x86_64-2.7.12-64/tmp88Zg7u/mod.cpp -L/home/ubuntu/anaconda2/lib -lpython2.7
ERROR (theano.gof.cmodule): [Errno 12] Cannot allocate memory

If its cant print the summary, how can it execute the model?


(ethan) #74

Does anyone know what this error comes from? I searched it up, and apparently it has something to do with the GPU. (http://stackoverflow.com/questions/24402213/theano-test-optimization-failure-due-to-constant-folding-on-ubuntu)

Code:

Error: Basically, it gives me this error, and then runs until the page crashes, and then I have to kill the page.

GPU Command:

The error is an “Optimization failure due to constant_folding”.
Thanks!


(Jeremy Howard (Admin)) #75

Looks like you’re out of memory. Maybe you are using load_array() rather than standard batches, and you don’t have enough memory for that?

Perhaps your validation set has a problem? Check visually that the labels correspond to the validation set images correctly.

Try going back over the video where I show my solution to statefarm and see if the simplest decent model I show there works equally well for you. If it doesn’t, then there must be a validation set problem.


(Jeremy Howard (Admin)) #76

I haven’t seen that before… Have you tried rebooting your instance? Have you checked that ‘batches’ contains the data you expect it to?


(vedshetty) #77

My p2 instance keeps restarting at this point:
test_data = get_data(path)

Unfortunately I do not believe I can use batches here because I am using model.predict_proba() which is expecting the input to be an numpy array and not a generator. Thoughts?


(Jeremy Howard (Admin)) #78

You can still use batches - use .flow() rather than .flow_from_directory. But you should probably be using predict_generator(), not predict_proba() - just change class_mode=None when you create you batches if you want raw probabilities.


(garima.agarwal) #79

I changed my model to be my own:

def conv1(batches):
model = Sequential([
BatchNormalization(axis=1, input_shape=(3,224,224)),
Convolution2D(32,3,3, activation=‘relu’),
BatchNormalization(axis=1),
MaxPooling2D(),
Convolution2D(64,3,3, activation=‘relu’),
BatchNormalization(axis=1),
MaxPooling2D(),
Convolution2D(128,3,3, activation=‘relu’),
BatchNormalization(axis=1),
MaxPooling2D(),
Flatten(),
Dense(200, activation=‘relu’),
BatchNormalization(),
Dropout(0.3),
Dense(10, activation=‘softmax’)
])

return model

My results are getting better now but I am still only at 40% for my validation set.
Epoch 1/2
20570/20570 [==============================] - 348s - loss: 1.3970 - acc: 0.5540 - val_loss: 3.6192 - val_acc: 0.1490
Epoch 2/2
20570/20570 [==============================] - 296s - loss: 0.6758 - acc: 0.7859 - val_loss: 2.7510 - val_acc: 0.3082
Epoch 1/5
20570/20570 [==============================] - 302s - loss: 0.4530 - acc: 0.8659 - val_loss: 2.4068 - val_acc: 0.3661
Epoch 2/5
20570/20570 [==============================] - 297s - loss: 0.3547 - acc: 0.8947 - val_loss: 2.6552 - val_acc: 0.3462
Epoch 3/5
20570/20570 [==============================] - 299s - loss: 0.2779 - acc: 0.9200 - val_loss: 2.3945 - val_acc: 0.3496
Epoch 4/5
20570/20570 [==============================] - 299s - loss: 0.2358 - acc: 0.9334 - val_loss: 2.4213 - val_acc: 0.3851
Epoch 5/5
20570/20570 [==============================] - 300s - loss: 0.2149 - acc: 0.9379 - val_loss: 2.3568 - val_acc: 0.4075

Btw, I did use data augementation for the validation set as well.

gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05,
shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+‘train’, gen_t, batch_size=batch_size)
val_batches = get_batches(path+‘valid’, gen_t, batch_size=batch_size)

After 12 epochs my accuracy seems to be going down.
If i were to go and add another layer to the model or add BN or dropout, i would have to throw away all the weights we have learnt so far … is that right?

:frowning:


(garima.agarwal) #80

After another hour of training I am up to

Epoch 1/1
20570/20570 [==============================] - 306s - loss: 0.1995 - acc: 0.9378 - val_loss: 1.9990 - val_acc: 0.4971

I dont think this is going to cut it.
I am still training with the Adam as optimizer. Here is gist if anything stands out as wrong.

Appreciate any comments.

Thanks
Garima


(Jeremy Howard (Admin)) #81

Looking good! Maybe try Dense(100)? And then find an amount of dropout that you can train for more epochs.

If you’re not going to use pre-trained vgg, you’ll need to use lots of data augmentation to get around the data shortage.