How do you do integrate sklearn StratifiedShuffleSplit with fastai

(Feras) #1

Curious how do you some kind of k-fold cross validation with the fastai library either the from_path or from_csv methods?

Also is there some wrapper around Learner to integrate with SkLearn Classifier for example to use it with it’s ensemble feature or generally with the skelarn ecosystem, similarly to the Keras wrapper?


(Ramesh Sampath) #2

I am not aware of any wrappers. But this thread discusses doing K-Fold - Dog Breed Identification challenge

(Feras) #3


(Feras) #4

Here is the full code snippet how I did it, but if someone knows a better way, im happy to hear it.

def get_data(sz, f_model, transforms, val_idxs, bs=64):
  tfms = tfms_from_model(f_model, sz, aug_tfms=transforms, max_zoom=1.1)
  return ImageClassifierData.from_csv(PATH, 'newtrain', label_csv, val_idxs=val_idxs, test_name='test', 
tfms=tfms, bs=bs)

#get the full dataset first and then use that to split
data = get_data(sz, [0])
skf = StratifiedKFold(n_splits=4, random_state=seed, shuffle=True)
splits = skf.split(np.zeros(len(data.trn_y)), data.trn_y)
datas = []
for train_index, val_index in splits:
  datas.append(get_data(sz, val_idxs=val_index))

learn = ConvLearner.pretrained(f_model, datas[0],precompute=False)

#loop through each fold and train for a bit
for data in datas:
   learn.set_data(data), 3, cycle_len=1, cycle_mult=2)