Some great stuff here! I am going to start on this context next… and am seriously considering building a new deep learning box around that new GTX 1080ti for contests like this one where the pretrained imagenet data doesn’t provide as much of a help. I can see right now that my little GTX 960 is going to be tied up for days…
Since my last post I haven’t gotten back to this contest yet. I recently finished off lesson 3 and I decided to stop spending money on AWS and build my own machine. Waiting for all the parts to arrive now and in the meantime, just reading as much as possible.
Awesome, nice work getting in the top 31%. I’m planning on trying a few different techniques just to experiment and learn. Thinking of 3 tests:
- Use VGG and fine-tune the dense layer (pre-trained weights),
- Use VGG conv layer and add a new dense model on top, retrain the last conv block
- Build the model from scratch and train it with the data provided
Using this post for reference:
I’ll aim to overfit at the beginning and then based on the best result from above, I’ll reduce overfitting with points from lesson 3
I have a doubt in lesson 3, after over fitting, do you discard the weights, or use it with more dropout? Say I use batch norm always…
And I read that you don’t have to muck with the weights when you change dropout as it divides the weight already, is that correct?
The additional data is horrible, and there is so many variations , some of it is greenish or yellowish pigmented stuff etc.
And I think I lucked into that 0.90 loss, I forgot to save the weights
Oh no! Sorry you lost your weights! I’m still nowhere near .90 loss. Do you remember if you retrained any convolutional layers? Also do you remember about how many epochs were required to get to .90? e.g. 10 epochs, 100 epochs, etc.
About 70 epochs trying different stuff in lesson 3. And I bet its pure dumb luck.
I started from scratch again .But its frustrating, now I can get the model to 73% accuracy on training and validation (with only data from train.zip). But the loss is wayyyyyyy higher like 1.4.
Scratching my head, how does loss and accuracy go up at once?
Let’s say you have a label [0 0 1] and your first prediction is [0 0.5 0.1]. Low accuracy and low loss. Then later you predict [0.3 0.3 0.4]. Higher loss and higher accuracy. Of course it depends on how you calculate loss…
I used some insane augmentation, maybe that resulted in too much noise.
Whatever I do it overfits because of less data.
I am at lesson 4 , but peeked into lesson 7 bounding boxes, planning to use that with the boxes given by someone in kaggle.
btw How is it going for you, did you try using additional dataset?
I am exhausted, for the past week tried applied everything in lesson 1, 2, 3 to this. Even peeked at bounding box in lesson 7. Everything overfits with the train.zip data, and the validation data set being small doesn’t give proper loss. I have validation loss of 0.88 but when I upload to kaggle with clipping its 1.4+. Don’t know how to evaluate results if validation itself doesn’t give proper loss values.
Additonal data has images in different colors, and bad quality, don’t know whether that can be used effectively.
Maybe I should leave this and come back after learning about more techniques ?
Nope not yet. I’m struggling to get my code running on the colfax cluster. So far it is either slow or buggy. Locally I am running into exploding gradients. But I’ll keep on trying.
I think that is what most others did with the dogs vs cats competition, do a lesson and then try to score higher on the competition, then do another lesson, etc. etc.
That competition has higher quality data and it’s easier to understand the results yourself. So this one is harder.
Just finishing off lesson 4 too and working some of the new techniques into my model. Still it’s a bit frustrating, the “guessing” aspect, which isn’t supposed to be guessing but I mix of knowledge and learned experience
Mostly of the time I get something that looks like this (after 30-ish epochs and it just stays):
loss: 1.3991 - acc: 0.3731 - val_loss: 1.2429 - val_acc: 0.5333
Then I get a “Now what?” scenario - Is it just a matter of running more epochs and patience or do I re-evaluate my model architecture?
My plan now is to spend the weekend with a hot new model, take her for a spin and hope it’s the beginning of a great new relationship (along with 200+ epochs)!
But after that, onto lesson 4 readings and then lesson 5.
Could you get the model to overfit first? If not I think you have to reevaluate the model architecture.
Overfitting will let you know that the model is learning something from the training set at the very least.
Then comes the experimentation with data augmentation, dropout, and pseudo labeling etc
I’ve been able to overfit the on sample training data (>0.98) with val_loss 0.97 and val_acc 0.52 (sample data is about 200 train and validation each). No data augmentation.
I’m now introducing data augmentation to try to get a better val_acc on the sample data first and then move to the entire training dataset with dropout and maybe pseudo labeling, where needed.
I’m now at 1.06 loss on the leaderboard with a simple two layer convolutional network with dropout after 10 epochs. I know it will probably get better with more epochs but I want to play with the network some more first, e.g. adding data augmentation. I took 25% of the training data as validation set and that gives me scores that are comparable to the leaderboard.
I moved to floydhub since I don’t have a capable machine myself and that is working well. The network is overfitting but I find that good because it is finally learning
I am at a 0.88 loss on the leaderboard with the following architecture and configs:
3 Conv-Blocks + 3 Dense layers with dropout
2/3 - 1/3 train/validation split
Trained for 25 epochs with Adam (but it started to overfit from the 21st)
I have tried several variations of the VGG-without-the last-X-layers approach and every one of them gets the training loss to zero but the validation loss bottoms out around 1.0 - 1.2 in just a few epochs and then starts climbing up again.
Hi, how much easier is it to use floydhub compared to AWS? … Just curious. Thanks.
What I like about floydhub is that you run a script, and then get a result. You don’t have to turn on and turn off instances, install software, worry about public IP addresses, etc. etc. Your script runs, then it costs money, and when it’s finished it doesn’t cost money anymore. You do pay for storage but that is very cheap
That being said you can’t modify the environment, you have to use the ones they provide, there seems to be some strange errors with the times in the logs and startup time is also long. I haven’t tried jupyter notebook but they also support that and that will probably work better.
There’s also some missing features like being able to easily delete multiple jobs so it feels a bit beta.
Above all you get the first 100 hours free. And it’s very easy if you follow their step-by-step tutorial.
I still want to try google cloud but I find their interface very complicated. They provide 1000 APIs and services and I don’t know which ones I need.
You could always add a shutdown command to your AWS scripts.