Oh, I see. Thank you so much! I will try the include_top=False
method now.
BTW, I submitted an averaged results today, the LB score is better than any individual submission. Just in case someone else wants to know.
Oh, I see. Thank you so much! I will try the include_top=False
method now.
BTW, I submitted an averaged results today, the LB score is better than any individual submission. Just in case someone else wants to know.
Hello @rteja1113,
Could you share how/where you setup 'image size 448x448" during the precomputing of vgg16bn, while it’s set as 224x224 standard ?
And why 448x448 and not more, since most pictures in MobileODT seem to have a minimum of 2448 either in width or height ?
That is according to this EDA post on Kaggle.
https://www.kaggle.com/philschmidt/cervix-eda-model-selection
Many thanks,
Eric
Hi @EricPB, I just did Vgg16BN(size=(448,448), include_top=False)
No particular reason why I picked 448.Someone in the forums mentioned that they were using 448.Surprisingly, I was getting comparable results when I use 128 even.
Many thanks !
Unfortunately my structure might be different than yours, I use the one from Statefarm Full of @Jeremy
When I enter your input such as:
Import our class
import vgg16bn_p3; reload(vgg16bn_p3)
from vgg16bn_p3 import Vgg16BN
Grab VGG16 and find the last convolutional layer
vgg = Vgg16BN(size=(512,512), include_top=False)
model=vgg.model
last_conv_idx = [i for i,l in enumerate(model.layers) if type(l) is Convolution2D][-1]
conv_layers = model.layers[:last_conv_idx+1]
Build a new model that includes everything up to that last convolutional layer
conv_model = Sequential(conv_layers)
Predict the outputs of that model by calculating the activations of that last convolutional layer
conv_feat = conv_model.predict_generator(batches, int(np.ceil(batches.samples/batch_size)), workers=3)
It triggers a ValueError:
ValueError: Error when checking : expected lambda_3_input to have shape (None, 3, 512, 512) but got array with shape (64, 3, 224, 224)
E.
Hi @EricPB, you need to mention the size in get_batches() as well.
batches = get_batches(your_directory, target_size=(512, 512))
Indeed setting image input to 512x512 didn’t improve the score but worsen it from 1.45 to 1.85
@shushi2000 / @Christina / @rashudo / @1729 : did it improve the score for you ?
Here’s my full notebook’s code with context:
Eric
Eric, I don’t have an answer for your question unfortunately. My computer had some issues, so I couldn’t run more experiments - and couldn’t finish the stage 2 competition. Too bad.
Hi Eric,
When the competition is over I will come back with more info but on this competition I found that small image sizes work better than large. Why? Maybe because the network has to learn the general structure of the cervix and this is already visible on low resolution, while high resolution images have a lot of distractions making it harder to learn. But I haven’t verified that. It could be that some more advanced network or approach can benefit from the higher resolution, but my networks couldn’t.
Hi @EricPB, I tried 128 size , but I didn’t use any pre-trained network.I just used a basic convnet, i.e conv-batchnorm-maxpool with increasing number of filters like in vgg.That seems to give best performance for me.
Hi Eric,
I haven’t done anything with Stage 2 yet but this is the notebook for my Stage 1 submission, if you are curious.
As you will see, I used an image size of 150x150 and trained a simple ConvNet from scratch. This yielded a 0.88 loss. BTW, I tried using VGG in many different ways but couldn’t get my loss below 1.00.
Having a modest go at Kaggle Planet competition re-using pre-trained models (Keras and Jeremy’s). Thank you all for sharing your experience here.
I finished 116th in this competition. Thanks Jeremy & Rachel!
Some things I learned:
Then some techniques I did not use but should have
I finished 86th in this one.
My experience:
My initial approach is to use pre-trained networks and average predictions.I was using vgg, resnet and couldn’t get below 10%.(Perhaps I should have re-trained more layers in pre-trained network like the 6th place solution)
Then I used this kernel https://www.kaggle.com/chattob/cervix-segmentation-gmm, to extract ROIs.But unfortunately this kernel does the ROI extraction using labels.So it’s not possible to extract ROIs for test images.Then I looked at wining solution of Statefarm distracted driver and how he/she used both cropped and original images as training set to force the network to focus on certain region.(using ssd, unet, RCNN would have been a better approach perhaps in this competition)
I used cropped+original in training set and used a simple vgg-style conv-net with 128x128 .This seems to give best performance for me.A single model gave me around 0.75 in public LB and averaging predictions with 8 such models including test-time augmentation took me to 0.6 ish in public LB
I didn’t submit clipped version of my predictions in stg2 final. Yesterday I found that clipped version would’ve taken me to 66th position but it was too late.
5)I didn’t had luck with pseudo-labeling and knowledge transfer.Perhaps I should’ve invested more time in it
Probably the most important aspect in this competition was to creating a validation set.A lot of them including myself ended up overfitting because of poor validation set.
Using bbox annotations and cropping didn’t give me the best performance but the training was a lot stable compared to other approaches.
I shouldn’t have depended too much on the additional-data set.There is a lot of overlap between additional and stg-1 test.
I couldn’t get good performance using stacking maybe because my predictions were too correlated.
Some of them in top-50 have used atleast 20 models in their final submissions. Perhaps trying different architectures like inception, resnet etc. would have improved my score a little
The key take-away for me is that there is still a lot to learn.Part-1 is just the tip of the iceberg.Completing part-2 would have given me an extra edge as a lot of recent stuff like attentional models, segmentation etc are explored.
All in all thanks a lot to Jeremy&Rachel for putting this course up.I was able to get my first bronze in kaggle by using material from this course
The 1st place team only had 2 entries… How’s that possible?
I think it’s due to the Stage 2 which reset the LeaderBoard, only displaying new submissions posted after.
If you click to expand the LB, only 261 teams are listed while there were 848 teams in total.
Heard anything?
Hi, It looks your reply isn’t complete… Although I got to talk to Floydhub. Things are bit different there.
If you click on the username (“rashudo”), the full message becomes visible.