currently, fastai has valid_pct, which will decide the percentage of data to be used as validation set.
is it possible to another parameter, called train_pct, which will define the percentage of data to be used as train data set.
the reason is that, when the data is huge, I want to have a small set of data to train my model first.
We have a RandomSubsetSplitter
which can do that:
cool, thank you
@muellerzr a follow up question:
for data read from csv, is it possible to do this, I means, to randomly select say, 8% of the train data, and 1% of the valid data to train?
the following codes are from 06_multicat
df = pd.read_csv(path / ‘train.csv’)
def splitter(df):
train = df.index[~df['is_valid']].tolist()
valid = df.index[df['is_valid']].tolist()
return train, valid
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
splitter=splitter,
get_x=get_x,
get_y=get_y,
item_tfms=RandomResizedCrop(128, min_scale=0.35))
dls = dblock.dataloaders(df)