As far as I understand even that is not possible correct? We can just give parameters as to how much we would want to zoom, shrink etc. Since there is this variability, we cannot use/precompute the boxes.
But, we can inspect the new values after they have been generated at random.
Probably better to crop/resize in advance using the bounding boxes, rather than using data augmentation.
Sure! (I showed lots of techniques in lesson 7, but didn’t combine them all together since it was an ongoing competition and that seemed unfair to the other competitors.)
My baseline score using just VGG scored 1.18. Adding data-augmentation or pseudo labeling makes my test score significantly worse, up to 1.5 in some cases.
I suspect that data-augmentation isn’t helping if my model is already overfitting to the training/val dataset, but am surprised that pseudo labeling isn’t helping.
What are some things people tried?
One thing I found to be helpful was applying augmentation at test time. Something like:
#pseudo-code create a final_preds container for a few loops: generate test data (sometimes with aug sometimes without) generate predicted class probabilities add predictions to final preds container divide final_preds by number of loops
This achieves an average over multiple augmented test images.
Just submitted my first entry. A simple baseline model with a Dense(512) and then soft max, with batchnorms in both(No augmentation, dropout, etc yet) . Ranked in the top 53%, with a loss of 1.25. Now, to test with a CNN. It surprises me that with something so simple one can get that high
If I understand this correctly, you are doing?
- Expand test set with data augmentation
- Pseudo labeling on expanded test set
- Ensemble results
That’s a cool idea to try and smoothen out any test-set specific differences vs. train set.
How was your score on leaderboard ?
Oddly enough, I tried training a convnet on the dataset, but failed to outperform my simple two-layer non-conv neural net. I learned in the process that dropout is better for the dense layers rather than the convolutional layers, and that it is better to use dropout in the later convlayers, not in the early ones, as the features in the first layers will be more generic. I’m now trying with VGG16bn, but even after a learning schedule involving over 20 iterations, and ensembling 4 Vggs, I cannot beat it. In fact, my score is substantially higher! I’m getting around 1.7 with the ensemble. The individual components get a validation accuracy of 0.916.
What could I try next? Trying with data augmentation, I get accuracies around 0.8 and validation accuracies as described above.
I thought that perhaps, given that there isn’t that much data in each individual category, data augmentation isn’t getting good batches for me, so to speak. So I disabled data augmentation and trained an ensemble of 2 models with the same schedule as above. Doing this, my models overfit, accuracies rise to 0.98, and validation accuracies raise to 0.95. I don’t have more submissions remaining today, so I’ll see what happens tomorrow when I submit this attempt.
Try cliping, for a kaggle score it helps a lot, but unfortunetlly has nothing to do with upgrading the “real life” score
Hi, my latest conclusion from this competition is that if you use the suggested above methods, you do massive over fitting of the ships. in other words, you predict the ships and not the fish.
given that the second stage will include mostly new ships (according announcement in the forum) I believe that the current leader board is useless.
you can witness what the overfitting by yourself if you’ll try to take one ship (ships are quite correlated with image sizes) out as validation set, and see you get horrible results - logloss of around 3-4.
After reading this post and the other post on kaggle forum, looks like many of the competitors train on the whole image, is this approach able to classify fish on the boat? It sounds more like “boat classification” rather than “fish classification” to me.
I have overfitting problem too, imbalance + small data are really hard to deal with.
Thinking about “boat classification”, it might be worth looking at confusion matrix for just fish vs. no-fish classes, to see how good/bad the model is at finding fish at all.
Think my models were usually decent at that.
Biggest sources of confusion were different types of tuna and class imbalance IIRC.
There’s a lot of talk in the forums about using a localization network like YOLO. I wanted to understand here what’s the primary difference between using a localization network using the bounding boxes approach in Lesson 7. Both yield the same result right? One is human-made and the other is unsupervised learning.
I got my biggest improvement today using an ensemble approach:
- Training a number of models with data-augmentation on the training data
- Also apply data-augmentation on test dataset
- get predictions using each model and average the predictions for each test image.
This got me to 1.03 on LB, from baseline 1.18
Feels awesome that it works, but the ensembling part is unsatisfying. Anyone else feel that way?
Credit to this guy for his notebooks:
Right - in general if you have bounding boxes provided, it’s best to use them! Something like YOLO is only needed if you don’t have them, and can’t be as accurate (since it has less information to use)
@twairball Did you use VGG16 for any of your models? What was the training and validation set accuracy?
I’ve gotten my best result (1.06) by just finetuning the last two dense layer, no data augmentation, pseudo-labeling nor any other tricks. I was overfitting training and validation loss pretty bad at 0.03/0.16 …
My next plans involve concentrating on the fish more, but I’m not yet sure how to really do it…
This however (and of course Jeremy + Rachel’s lessons and all the great information on these forums!!) has given me some ideas of how to proceed:
@torkku I used vgg16bn for these ensembles.
I trained with 512 dense nodes in the dense layers with
lr=1e-4 to around 20 epochs each; from my testing with single models this seemed to get best baseline result. With data augmentation my training losses were always higher than my validation loss; val loss was around 0.15-0.20 range.
I’ve also tried training single Resnet50 model, this model doesn’t have many dense layers like with VGG, however the results were worse than VGG models.
I would be keen to see how you are fine tuning the dense layers if you care to share?
I find out the hardest part of this competition is, the results on the validation set almost means nothing, it do not reflect the true situation when you apply it on the test set. Data itself are leaky because there are many images similar to each other.
I guess the best way to solve this problem is gather more data(but this is hard to do), do anyone have better suggestions?
If we already use pretrained network like vgg or resnet trained on imagenet, is it meaningful for us to reuse the fish images of imagenet as extra data set?Thanks