Dog Breed Identification challenge

No reasons to get 0.47 when you have 0. 16

Sorry if this is a simple question but how would one “train” a model on predictions instead of the images?

My bad =D

First predict on your training images. You will get 1000 probabilities, then use your favorite machine learning package for example (sklearn) with multiclass logistic regression.

6 Likes

Hey… Congrats on the score! Could you give a hint on how to train with all the training data. Currently trained by splitting the data between training and validation- so interested to see how much it improves without splitting the data.

Thanks!

Is it as simple as moving just one image to validation? Will try that in a bit!

With the current fastai code I don’t think it accepts 0 images for validation set. So for now you can just use val_idxs = [0] which will end up having one image in the validation set and the rest is all in the training set. It should produce similar results to training with all images as its very close to what you want!

6 Likes

I have been able to achieve a score of 0.385 mostly by adjustment of the learning rate. I also adjusted precompute and the the cycle_len parameter to obtain the score. The cycle_mult parameter did not seem to have much influence here.

I am not sure how to code model averaging / ensembling at this time. Although I see there are posts in this topic.

What are some of the models you have in your ensemble? just fyi my best ensemble only has 3 models atm.

Though @jamesrequa already replied, here’s mine.
I did split into training and validation by the following code:
val_idxs = get_cv_idxs(n, val_pct=1e-4)
Then I got val_idxs = 1 as shown below.


Hope this may help :smile:

6 Likes

Is this concept the same as what the winner of Kaggle Planet competition is describing here as “ridge regression”?
"to predict the final clear probability (from the resnet-101 model alone), I have a specific clear ridge regression model that takes in the resnet-101 model’s predictions of all 17 labels."

No, the planet winner is describing an ensembling approach, whereas @yinterian is simply describing a way to use a pretrained network.

2 Likes

Hi @sermakarevich

I have been trying to get my head around what you are doing! When you do a 5-fold CV are you training a model from each architecture 5 times with a randomly resampled set of images (i.e. a different train/test split each time from a full set of data - as in classical CV)?

Then, are you taking an average of the final CV scores of each of your models as the score for your ensemble?

If this is the case, how do you apply that to get an output for the kaggle competition - I have never entered one so it may sound like a dumb question, but I thought that you would have to get an output by passing data into a model, to get a set of predictions / matches. How could you do this with a bunch of models?

Or, perhaps you are doing something on a lower level and creating a new model by assembling / chaining together components from each of the architectures?

Finally, if by some fluke I have described something accurately here, even if you do have a bunch of predictions from a group of models, when applying these to a (real-word) problem, how do you choose what is the “correct prediction” for an item of an unknown class - is it again the average?

Apart from solving a possible over-fitting problem, and gaining confidence about the robustness of your approach since it is an ensemble, how is this approach better than just choosing the very best individual architecture?

Those are inception, inceptionresnet and resnet with different image sizes.

ok you might want to try resnext that was one of my best ones

3 Likes

Hi @Chris_Palmer.

With sklearn.model_selection.StratifiedShuffleSplit and data.from_csv

thats almost right. At each CV step you do 2 predictions: 1 for validation set and one for test set. Test set prediction you just average, train predictions you concat. These oof train predictions you can use to define how to blend/average your test predictions. Those might be simple average, median, weight, or as @yinterian recommended ridge or lr on top of your predictions. This also typically called as stacking - using models outputs as inputs into next level of models.

Take a look at these two articles/posts:

@jamesrequa thanks man, I will try it. So many parameters, its easy to get lost when I don`t have enough intuition :wink:

@jeremy @yinterian I think I got the reasons why imagenet network with original layers might perform better than fine tuned. Because it observed more dogs when was trained as in imagenet competitions there was another train/test split. Is thats right?

6 Likes

Great links! Thanks

My ensemble also has 3 models from resnext, inceptionresnet, and inception, just fyi.

1 Like

Hello @jamesrequa,

sorry if my following question has already been published in the forum.
You wrote :

Does it means there is no way to pass the test folder name AFTER the training of the model learn() ?

I ask this question because I trained my model without test_name (by default, test_name=None) after the following code for data :
data = ImageClassifierData.from_csv(PATH, 'train', f'{PATH}labels.csv', val_idxs=val_idxs, suffix='.jpg', tfms=tfms, bs=bs)

@jeremy can you please share your approach to fine tune a model (maybe you can cover this in next lecture)? I mean not these steps from lesson 1, but like:

  • how do you define best dropout
  • nodes in fc layer
  • number of fc layers
  • number of steps with each lr
  • how do you deal with randomness variation
  • etc.
2 Likes