I tend to work with a small subset of the data first, then increase the dataset size. The data block api offers the
use_partial_data so select a subset of the data (independent of train and valid). I however want to use full valid ds and only train on a smaller train ds. With the current API your can achieve this by combining
use_partial_data, some manual math, and
random_split_by_pct, but this is a bit cumbersome.
I wrote a little function
split_subsets which allows you to define the size of the train and valid ds independently. It looks something like this:
# do the the normal split like random_split_by_pct ImageList.from_csv(...).split_subsets(train_size=.8, valid_size=.2) # only work with a tenth of the train ds but keep the valid ds size ImageList.from_csv(...).split_subsets(train_size=.08, valid_size=.2)
If you’re interested I can put together a PR. I’m also open to suggestions (especially regarding the name of the function ).
Here is the current implementation: