@sermakarevich Great, thanks

I am afraid of asking this kind of basic question, but what did you average (weights, probs, or…)?, and how did you average them in practice?

# Dog Breed Identification challenge

**rikiya**(Rikiya Yamashita) #66

**sermakarevich**(sergii makarevych) #67

Yep, thats really scary question In practice it looks like this:

- I have 5 predictions for a test set because I do 5-fold CV
- averaging CV predictions for test set for each model so at the end I have a single test set prediction for each training config (model / image size)
- through CV i get train set predictions as well - this allows me to check how should I average predictions from different models (you might end up with just a mean, or median, or weights, or blending with additional model) and better understand all models accuracy as whole train set is better than validation set
- averaging test predictions from different models

**thiago**#68

Amazing! Thanks @sermakarevich!

I’m in 11th place, and I’ll try your approach.

How do you do a 5-fold CV with the fastai lib?

**sermakarevich**(sergii makarevych) #69

fastai students gonna rock it

It turned out to be pretty easy with `sklearn StratifiedKFold`

indexes with `ImageClassifierData.from_csv`

method. You just need to define `val_idxs`

parameter.

**rikiya**(Rikiya Yamashita) #70

Thanks @sermakarevich ! Just qq.

You mean you did CV for test set rather than training/validation?

That’s pretty cool Thanks!

BTW, are there any standard methods of ensembling multiple models or architectures, I mean re “weights or probabilities” or “mean or median”?

**sermakarevich**(sergii makarevych) #71

- splite train set into 5 parts with
`sklearn StratifiedKFold`

- 4 parts are used as train-1 set and 1 is used as valid-1 set
- this is done by StratifiedKFold.split method which returns indexes for train-1 set (80% of original train) and indexes for valid-1 set (20% of original train)
- tune a model
- do TTA predictions for test and valid-1 (20% of train set)
- iterate through this 5 times

@jamesrequa knows this better than me. I used two different ways:

- just avg(sum(all predictions))
- extracted features from convolutional layers from different models are stacked together and only than I feed them into FC layer.

How do you do integrate sklearn StratifiedShuffleSplit with fastai

**rikiya**(Rikiya Yamashita) #72

I’m asking because for the dog breed competition when I tried to ensemble three models by just simply averaging their probabilities, each log loss was around 0.21, the outcome jumped up high to around 13 . Thats why Thanks!

**sermakarevich**(sergii makarevych) #73

Check rows and columns ordering. 13 is definitely an error.

**rikiya**(Rikiya Yamashita) #74

Thanks a lot, now it’s super clear

And, sure, I was wrong somewhere in the averaging process, I’ll try again

**bushaev**(Vitaly Bushaev) #77

why not get predictions for test by training on the whole dataset instead of CV ?

**sermakarevich**(sergii makarevych) #78

No reasons why not to. You only need to know how to optimise a model without a validation set. With CV one can achieve

- better understanding of accuracy
- get predictions for train set
- get mini ensemble for test set.

This mini ensemble gives 0.02 log loss improvement test vs train (which is 10%).

**wgpubs**(WG) #79

I’m assuming you mean a *new* model for each iteration, correct?

… and thanks for the detailed and nice writeup on using K-Fold CV!

**A_TF57**(Ankit Goila) #80

How do I submit my results to Kaggle?

I ran some tests and built a decent classifier for my first submission, but it’s not clear to me how to get those predictions into a csv file for submitting.

**KevinB**(Kevin Bird) #81

Look at the last few lines of this Kernel for an example of that:

https://www.kaggle.com/orangutan/keras-vgg19-starter

One step they don’t do though is:

```
sub.to_csv(path+"filename.csv", index=False)
```

Understanding how to "Put it all Together"

**thiago**#84

Thanks @sermakarevich! I’ve got the 8th place just by getting the mean of some good models. =D

**sermakarevich**(sergii makarevych) #85

Congrats @thiago and @rikiya ! @jeremy 8-11 places are fastai students. I assume first 6-7 are cheaters, so… good start