K-Folds vs One Model?

Hi everyone, this may seem like a very basic question but why and when should you use K-Folds vs the entire dataset? And why is it generally better?

If anyone has articles they specifically like that detail this I would appreciate that too!

Thanks :slight_smile:

1 Like

In this video at the end they argue that while one is the fastest method (one model) a full cross-validation will lead to the most accurate as eventually you will iterate over the entirety of the data and the resulting ensemble is the best in terms of accuracy. Agree? Thoughts?

The other note with k-fold is you iterate over the entire set in RF, so for NN we would still absolutely need a seperate test set as usual to grade on.

I’ll have a notebook up with that full process implemented. Still want thoughts or opinions :slight_smile:

Here is a notebook for implementing K-Fold correctly for neural networks and the fastai library!

notebook

8 Likes

Hey @muellerzr very nice work! Answering to your question CV has multiple uses in addition to getting more reliable estimates for the metrics. I recommend this article since I find it very clear.

In addition, would you mind clarifying a couple lines? I would like to make sure I fully understand your approach and do the same for images.

related to tabular data (not that familiar with it), why is this necessary?:

processor=data_init.processor

related to the CV approach. I do not understand why you are getting the test set into the loop. To me this part should not be there. Can you justify what are you doing?

1 Like

Thanks @mgloria! When I’m completely training the model, I then interject the test set in to only validate and get me results. It’s never trained on it.

To the processor, this is so the same transforms are applied to the training set as they are to our test set, or vice versa. Pass it in the same when doing your imagelist and it will run fine. (Thoufh I’m unsure if it’s even needed. I needed it for categorical embeddings so that the model gets what it’s expecting)

Does this answer it? :slight_smile:

Thanks @muellerzr

Very clever regarding the processors! I was not aware of it. Regarding the CV, let me rephrase to make sure I got it (may be useful to some other folks too): in your particular example, the testing set has labels (this is not usually the case in e.g. kaggle competitions). The fact that it has labels allows you to trick a bit the library in that you treat the test_set.train as test_set.valid and then pass it to the learn.data.valid_dl. This way you also have CV on the predictions of your model. Correct?

Correct! I’ve posted about it a few times, I found the trick a few months back and it’s become invaluable in research and small scale testing :slight_smile:

Alright, let us now further discuss the use of it. I usually train for more than one epoch, I monitor with callbacks the validation loss and if I start observing overfitting I stop the training. You are doing here only one epoch, which is most likely not enough to train the model. Which was your plan? :nerd_face:

It was just a quick example. If you’re worried about that add in the EarlyStopping callback to monitor this, and when that’s done just swap and grab your final results as usual :slight_smile: and set your epoch length to more beforehand

just like this? I am thinking about it but it is not so clear to me how it will work out with the folds. I would like to stop the training if e.g. I observe overfitting in the fold-average validation loss, not on one single fold as the callback offers me

Think of it like this:

We have ten seperate learners we train the same exact way. So when we create the learner add in an early stopping callback and when we call fit_one_cycle just run for more epochs

The problem I see in it is that the callback could stop the training earlier in some folds than in others so you could at the end not really average since the vectors would have different lengths

Hmm. I’d say small scale test to see what roughly works and then go from there. In terms of epochs and overfitting if we’re wanting epochs today be the exact same. As were expecting some variation

Good idea. I would be up for so me experimentation if you are in since I find it an interesting question. What do you say? I could set it up in collab and then ping you.

Sure! Go ahead and report back what you find :slight_smile:

1 Like

Hi @muellerzr as promised the link my collab. As a start, I did 2 interesting things:

  1. Adapted cross-validation code to work for image problems (couple of tricks needed)

  2. Adapted cross-validation code to work for more than one epoch

Could you take a look at it? It works perfectly but maybe some bits can be rewritten in a better form, also ensuring the experiments are reproducible is important, and/or a way to run them faster (I tried to make the training set small but still takes a while)… finally, you could try the callbacks (I am unfortunately travelling next week so I will not get the chance to it until the week after). It could be great if you start before!

1 Like

Thanks @mgloria! I took a look at the link, all looked swell except I accidentally started modifying without a copy :sad: my apologies

I loved the method, it looked perfect. I am trying to currently figure out a way we can do this modularly with any type of training (though this is harder, may not be worth it, but I’m trying atleast lol)

Here is what I have so far:

class StratifiedFit:
  def __init__(self, train_data:DataBunch, test_data:DataBunch, n_folds:int=10, epochs:int=1, 
               bs:int=64, callback_fns:list=[]):
    self.path, train = data.path, data.train_ds
    self.bs, self.num_epochs, self.path, self.callback_fns = bs, epochs, path, callback_fns
    self.train_items, self.val_items = train.x.items, train.y.items
    self.processor= test_data.processor
    
    self.skf = StratifiedKFold(n_splits=n_folds, shuffle=True, random_state=1)
    self.val_loss, self.acc, self.final_results = self.__make_dfs__(n_folds)
    self.test_data = test_data.valid_dl
    
    self.fit()    
    self._ret_results()
    
  def fit(self):
    "Generates our fold and sends it for training"
    for i, (_, val_idx) in enumerate(self.skf.split(self.train_items, self.val_items)):
      data_fold = (ImageList.from_folder(path)
                  .split_by_idx(val_idx)
                  .label_from_folder()
                  .transform(get_transforms(), size=112)
                  .databunch(bs=self.bs))
      learn = cnn_learner(data_fold, models.resnet18, metrics=accuracy, callback_fns=self.callback_fns)
      self.__train__(learn, self.num_epochs, i)
      
  
  def __chain__(self, o): return itertools.chain.from_iterable(o)
  
  def _ret_results(self): return self.val_loss, self.acc, self.final_results
    
  def __make_dfs__(self, n_folds:int):
    "Make three dataframes for accuracy and loss per epoch, and an overall results"
    cols = []
    [cols.append(f'fold_{f}') for f in range(1, n_folds+1)];
    val_loss = pd.DataFrame(columns=cols)
    acc = val_loss.copy()
    cols = ['fold', 'valid_accuracy', 'test_accuracy']
    final_results = pd.DataFrame(columns=cols)
    return val_loss, acc, final_results
      
  def __train__(self, learn:Learner, n_epochs:int, fold:int):
    "Train a fold"
    print(f'fold_{fold+1}:')
    learn.fit_one_cycle(n_epochs)
    print('\n')
    self.acc.iloc[:, fold] = [e.item() for e in self.__chain__(learn.recorder.metrics)]
    self.val_loss.iloc[:, fold] = learn.recorder.val_losses
    
    acc, loss = self.__test__(learn)
    self.final_results.append([fold+1, acc, loss])
    
  def __test__(self, learn:Learner):
    "Get test results on a fold"
    learn.data.valid_dl = self.test_data
    return learn.validate()

If you want to include callbacks here, now you can pass in callback_fns

I have a few more ideas to simplify this so it’s nowhere near done

*It may not quite work, don’t have time to revisit this and modify until tonight

Awesome @muellerzr! Could you add an example of how to call it e.g. for our image problem? I always get a bit confused when classes are used… need to work on it :nerd_face:

In any case, I realized the code breaks if more than one metric is passed e.g. accuracy and f1, which is a very realistic situation. Hence, I modified it in the following way and now it works.

In addition, it concerns me a bit the fact that the results are not reproducible despite me trying to set a seed (see also my experiment at the end of collab nb).

Have you tried if callbacks work? I somehow feel we should work first on reproducibility and callbacks to customize the training before moving to a modular example which, by the way, I think is a terrific idea!