R-CNNs have had loads of real-world successes, so that would be a good place to start. However, no-one (that I know of) really believes that they’re the right way in the long run.
By the looks of your recent rise in the leaderboard, I’m guessing you found an object localization method that worked @jeremy I’m trying to get the Single Shot Multi-Box Detector running…
Actually I haven’t even really used localization yet! I’ll show tomorrow what I’ve done to get into the top 20. You’d be surprised at how little it took… I think there’s a lot of room to improve on the top of the leaderboard still.
i have to say, thank you! for this course.
i’m quite impressed. after just lesson 3 and following the techniques in the notebooks, was able to rank in the top 20% on my first test attempt within 4 hours of starting.
i used the pre-trained vgg network, only trained a new set of fc layers with batch normalization and dropout to get a 1.10708 score. excited to try to optimize from here and moving on to lesson 4.
i’ve been trying a few different models and trying to optimize my initial score. the scores i’m getting seem to be better than my initial guess, but i haven’t been able to increase my submission score.
im confused. is my validation set not a good sample of the data?
my initial results that scored a 1.07542:
loss: 0.6488 - acc: 0.7787 - val_loss: 0.2963 - val_acc: 0.9088
my latest results, i thought i was getting better but scored a 1.14936:
loss: 0.3868 - acc: 0.8679 - val_loss: 0.2010 - val_acc: 0.9464
and idea what i could be doing wrong? feels like the validation set is incorrect…
would my next step be trying to create a validation set with a log loss similar to the submission score?
Correct, the validation set isn’t reflecting the test set correctly. We don’t know why either - it’s an unsolved problem at this stage. You’ll have to try to figure out what the sneaky competition admins did to create their test set!..
Lesson 7 discussion
I’m using lesson 4 approach to learn more about Fisheries Monitoring challenge. I.e. started with simple linear model then added a hidden layer with bachnorm and L1 regularization. Then, before moving to convolutional models, I did some data augmentation.
However, even with this simple model, I’m hitting close to 100% training accuracy on sample data (500 training samples, 1000 validation samples), and validation accuracy keeps improving (for example goes pretty easily to 80%). However, if I do predictions and submit to Kaggle, my score is really bad.
Is this expected?
Not sure what it is happening with your sample but the dataset is quite tricky in that only a small number of boats are present (this is discussed on lesson 7 in detail). If your sample does not contain enough “diversity” of boats it might not have a clue what to predict when in the test set is given a boat it hasn’t seen that much… Hope this helps (it is just a random guess…)
Yes, this would be expected. You have a bit of overfitting because you are missing potential solutions for 1,3, and 4 below:
Add more data
Use data augmentation
Use architectures that generalize well
Reduce architecture complexity.
For the first one, why are you using only 500 images to train? Secondly, why not use VGG to benefit from pretrained networks? Lastly, as part of the above why not include with dropout?
The top percent are likely using localization on top of the above. To give a rough idea, my current ranking is roughly ~100 and my validation percentage was around 98.5% accuracy.
Yes, I’m going to use all the bells and whistles later, but decided to follow the lesson 4 notebook, to learn more how Jeremy approaches new DL problems.
I’m surprised that simple hidden layer model can overfit even data augmented training set so well, it must use environment cues mainly? And if the test set has images from different ships, like @Gelu74 mentioned, that explains it, right?
How do you investigate/debug these kind of issues? Plot images between training and validation sets and investigate them manually?
@tmu there is a script on the kaggle website that “detects” the fishboat each image belongs to, also each boat has a different image size, you could plot the image sizes of your valid and train and compare to the test.
@sml0820: Thanks for sharing some suggestions. I’d love to hear more about your approach to get into the top hundred? Are you currently using the pre-trained VGG model wth dropout? Are you using data augmentation, if so, what settings are you using? I explored through many of the options for data augmentation and built a new training image set with these settings:
aug_gen = image.ImageDataGenerator(rotation_range=15, width_shift_range=.2, height_shift_range=.2, zoom_range=.5, channel_shift_range=.5, horizontal_flip=True, vertical_flip=True)
My current best score (test loss ~ 1.2) is a fresh (non-vgg) CNN based on the materials presented in Lesson 4 and trained on an augmented data set of 6400 images produced using the settings above.
Also curious if you’ve incorporated the extra info @Gelu74 mentioned (image size). I’ve shied away from this since its likely they’ll resolve this in the next test set, but curious if you’ve used that data to improve your model.
I’ve managed to significantly improve my leaderboard score by ensembling models. I trained three models separately (VGG, Inception, & Resnet) and then generated predictions from each model and averaged them. This has gotten me into the .89 range on the kaggle leader board.
Based on these results, now I’m working on trying to do the merging in the network - so now I have a model that takes an image, feeds it through resnet and vgg seperately, then merges the outputs of those models into one flattened vector. From there I’ll feed it through a few fully connected layers and then predict. We’ll see how this goes
Good job! I also use ensembling to improve my score. But I think if you wanna be on the top of the LB you have to build a model or an algorithm that detects and crops the fish accurately?
In video 7 @jeremy built two successful models: VGG + dense layers targeting bounding boxes, and a full CNN. I’m wondering if these two approaches can be combined. Can we build an FCN that also takes advantage of annotated bounding boxes?
If you are planning for data augmentation, how can you localize the fish?
Using data augmentation, how can you localize the fish?
You’d compute the new bounding box, possibly by writing a subclass of the ImageDataGenerator…