Running my first Iearn.fit_one_cycle(x) on a pathology dataset takes quite some time.
The whole dataset is about 6 Gb. It takes at least 10 minutes for 1 epoch.
To speed up the initial feedback, is there a method available to train on a subset of the whole dataset?
Just to speed up the process in the beginning? Or is this not a good approach?
Definitely the right approach because let you try different solution quickly and choose the best one. Jeremy does this since the first version of the course (v1/2016).
With a big dataset like the one you’re using I usually start assessing different models using a number of samples that let’s you run an epoch in maximum a minute (usualli around 5-10% of total samples).
Be sure to sample “properly” the original dataset to extract the smaller version (IE: for classification random shuffle the data before sampling and verify that the distribution is similar to the original one). According to my experience using the whole dataset usually improves the accouracy by a foctor around 10-20%.