Dog Breed Identification challenge

If I got it right, there’s a default validation-percent parameter, val_pct in get_cv_idxs(.) that’s set to 0.2 – so I’m using the default 20% validation set size.

@jeremy got it; I’m trying to make sure I’m pointed in the right general direction before I start charging into oblivion :smiley:

So far I’m down to a trn/val loss of 2.547…/2.434 after 4 epochs at lrs=1e-1 … so… seems like the right direction. We’ll see.

@Antti yup, discovered that last week :slight_smile: I think there’s a set_data method letting you add it in later on which is very useful.

One note: is returning batch size instead of image sizes? I just noticed it’s giving me my batch (64) size even after I resized to 300.

Is it possible you are passing the batch size instead of image size, something like this:
tfms = tfms_from_model(arch, bs, aug_tfms=transforms_side_on, max_zoom=1.1)

instead of
tfms = tfms_from_model(arch, sz, aug_tfms=transforms_side_on, max_zoom=1.1)

Hi @jamesrequa

Regarding this, after training with validation, is the correct process as follows :

set val_idxs = [0]
data = get_data(sz_bs) 

# I am currently in precompute = False, as that was the state of my model after training with validation

# Do I need to repeat all steps, including my "warm up" with precompute=True ???? 

learn.precompute = True ... my parameters with precompute=False ... )

learn.precompute = False ... my parameters with precompute=True ... )

log_preds, y=lear.TTA(test=True)

@Chris_Palmer Everything looks good except…I think you can skip precompute = True so just follow all of the same steps you did before starting with precompute = False and follow those same parameters throughout.

Just to clarify a bit more why you should skip precompute=True is because your activations were previously generated based on your validation set and training set split. So if you set that to True with now a different training set (contains validation images) and validation set (only has 1 image) now you basically have to generate all the activations over again anyway and the ones you had saved previously couldn’t be re-used in this case.

Also just a reminder to convert your logs into probabilities
preds = np.exp(log_preds)

Thanks James :slight_smile:

And getting these into a format to submit to Kaggle - how is that done? I guess there is a lot of information available about it - I think Jeremy has recently posted something - but it is a step after running the preds = n.exp(log_preds) ?

Yep I think this process has been covered a few times in this thread and other threads in the forum but here are links to two of my posts with some references that should help you out! In particular the one from Jeremy posted to the top of Lesson 3 Wiki is a line by line code example on submitting to kaggle for dog breed comp.

Hi @jamesrequa

Thanks for all of your help on this, and yes, its the DogBreed competition! I am very curious to see where I can get, so I can test my understanding of Jeremy’s and others’ instructions!

Have just submitted this “trained on all the data” model and it does not do very well - a score of 0.22755. I am not sure that I can trust the way I have gone about it, and as there is so much digging around to get clear understanding on this, it’s no wonder :expressionless:

I think I will just go back to the drawing board and try and follow the regular approach to see where it gets me, because I was certainly getting a better score during the building of the model, but all of this extra work with the ful data set has left me in the dark.

Is there any way to test how well we are doing on the test set before we submit to kaggle?

Running through my learning again after setting val_idxs = [0] (i.e. just one validation file), I still am getting validation errors and predictions that look like the previous information when I had a full data set - is this expected? Is this validation against my one and only validation file?

Also, should I be looking at the error rate and accuracy data and choosing a place to stop the learning, as I can see that after the second run I am increasingly overfitting and my accuracy is getting worse, so going all the way through 3 runs may not be desirable? Its very hard to know what is really happening without a validation set!

# step 1, 7)
[ 0.       0.22371  0.22195  0.94088]                        
[ 1.       0.20968  0.2259   0.93747]                        
[ 2.       0.20207  0.22398  0.93844]                        
[ 3.       0.20532  0.22566  0.93939]                        
[ 4.       0.18854  0.22653  0.93698]                        
[ 5.       0.20381  0.22526  0.94088]                        
[ 6.       0.21357  0.22947  0.93597]

# step 2 -- should I have stopped after this?, 3, cycle_len=2, cycle_mult = 2)
[ 0.       0.17768  0.22844  0.93844]                        
[ 1.       0.17088  0.23041  0.93695]                        
[ 2.       0.16777  0.23185  0.93796]                        
[ 6.       0.17352  0.23387  0.93698]                        
[ 7.       0.16513  0.22885  0.93646]                        
[ 8.       0.16994  0.23512  0.93792]                        
[ 9.       0.16108  0.23063  0.93991]                        
[ 10.        0.15742   0.23026   0.93939]                    
[ 11.        0.14899   0.22877   0.93991]                    
[ 12.        0.14532   0.23005   0.94137]                    
[ 13.        0.16061   0.22951   0.9404 ]   

# step 3 - carry on, even though extreme overfitting???, 3, cycle_len=1, cycle_mult = 2)
[ 0.       0.16628  0.23203  0.93503]                        
[ 1.       0.15619  0.23206  0.93646]                        
[ 2.       0.14303  0.23088  0.93548]                        
[ 3.       0.15428  0.23497  0.93796]                        
[ 4.       0.15449  0.23107  0.93841]                        
[ 5.       0.1584   0.23028  0.93841]                        
[ 6.       0.14592  0.2302   0.93942] 

How do I get predictions on the training set for Dog Breed competition to create a linear model?
I created a model as shown in the cifar 10 post.

def get_data(sz, bs):
        tfms = tfms_from_model(m, sz)
        data = ImageClassifierData.from_csv(PATH, "train", f'{PATH}labels.csv', test_name = "test", 
                                            val_idxs = val_idxs, tfms = tfms, suffix = ".jpg", bs = bs )
        return data if sz>300 else data.resize(340, "tmp")

m = resnet101(True)
bmodel = BasicModel(m.cuda(), name='resnet101')

bs = 58
sz = 299
data = get_data(sz, bs)
learn = ConvLearner(data, bmodel)

I tried train_features = learn.predict(data.trn_dl), but it returns an array with shape (10357, 1000). I also tried train_features = learn.predict(data.trn_ds) and it has the same shape. data.trn_y has a shape (8178,) which is what I expected since 20% of the data is in the validation set.

I’m using:
probs = model.predict_dl(data.trn_dl)


Thank you - learn.predict_dl(data.trn_dl) worked.

Hi @binga

This error is coming from my call to metrics.log_loss. I believe its because something is wrong with y I am passing as a parameter.

If I switch around the parameters and say metrics.log_loss(probs, y) then I get a different error, “ValueError: Multioutput target data is not supported with label binarization” - so I don’t think that the parameters are in an incorrect order.

Actually, y is just an array of zeros which can’t be right, so maybe when we use log_preds, y = learn.TTA(is_test=True) then the y we are getting back is incorrect.

I believe that my y is just zeros because I have used val_idxs = [0] - in order to train with the entire data set.

In summary I believe the problem it reports (that I need to “Please provide the true labels explicitly through the labels argument”) means that I need to get my labels some other way - how would this be done?

Ah, is that the case? Apologies for missing that part. Now the error makes sense. Since you have put val_idxs = [0], the validation set would have had (almost) less than 10 records and it so happens that all those records are of the same label. Sklearn enforces atleast two different labels constraint; which is the error you see.

Anyways, if you’re training on all the data, this number will not be significant to your analysis right? Why would you want to calculate it anyway?

Edit: I’m on my phone right now. I’ll get back on this a little while later.

,i’m trying to use kaggle-cli on crestle .after kg config i encounter this error .do anyone of you have any suggestion what should i do to resolve this issue.

does anyone know how i can resolve this issue

Did you try pip install lxml ?

yes ,it say requirement already satisfied

Some conflict of versions, I do not know how to solve such issues. Maybe try to re-install kaggle cli?

Did you update kaggle-cli?

even when i try to reinstall kaggle cli ,it’s clearly mention requirement already satisfied lxml …
then again the same error is thrown to me