Definitely the right approach because let you try different solution quickly and choose the best one. Jeremy does this since the first version of the course (v1/2016).
With a big dataset like the one you’re using I usually start assessing different models using a number of samples that let’s you run an epoch in maximum a minute (usualli around 5-10% of total samples).
Be sure to sample “properly” the original dataset to extract the smaller version (IE: for classification random shuffle the data before sampling and verify that the distribution is similar to the original one). According to my experience using the whole dataset usually improves the accouracy by a foctor around 10-20%.