Sorry guys if this question was already raised (and answered). Does the library contain stratified sampling methods? As I can see, ImageDataBunch
automatically splits the data into training and validation set using random_split
function, like in this method:
class ImageDataBunch(DataBunch):
...
@classmethod
def from_lists(cls, path:PathOrStr, fnames:FilePathList, labels:Collection[str], valid_pct:int=0.2, test:str=None, **kwargs):
classes = uniqueify(labels)
train,valid = random_split(valid_pct, fnames, labels)
datasets = [ImageClassificationDataset(*train, classes),
ImageClassificationDataset(*valid, classes)]
if test: datasets.append(ImageClassificationDataset.from_single_folder(Path(path)/test, classes=classes))
return cls.create(*datasets, path=path, **kwargs)
And, the random_split
function uses uniform distribution to separate observations. Is there a method similar to StratifiedShuffleSplit class from scikit-learn
? Like, to split imbalanced dataset?
Or is it better to use scikit-learn
itself to prepare data before feeding samples into data bunch?
Hope the question makes sense.