Data sizes in planet video

Sorry to resurrect this. I have a further question.
For arguments sake, let’s assume the original data size is 512 and pretrained model input is 256

The basic flow is:
1 data = get_data(64)
2 data = data.resize(int(sz*1.3), ‘tmp’)
3 learn = ConvLearner.pretrained(f_model, data, metrics=metrics)
4, 3, cycle_len=1, cycle_mult=2)

6 learn.set_data(get_data(128))
7 learn.freeze()
8, 3, cycle_len=1, cycle_mult=2)

Here is my understanding from previous threads:

  • The model has a constant input dimension. It ALWAYS takes, e.g. 256. The transform has to make 64 fit to the model size 256. Presumably by interpolation?
  • line 1 uses sz in “arranging” for on-the-fly transforms from 64 to 256 to make the data fit model. Getting from original 512 to sz-64 is on-the-fly centre crop, throwing away lots of image. Getting the 64 to model size 256 requires on-fly interpolation(? very expensive)
  • line 2 resizes all at once the original data also with centre crop, so it does not mitigate the throwing away of the data. It is helpful in that the data cropping is not done on the fly so it saves a little time. But the interpolation is still there(?).
  • line 3 puts the data object into the learner
  • line 6 redoes get_data for the sz 128, just setting up the transform. But there is no subsequent data resize. It places the new data object in the learner, together with transforms that crop to 128 and then interpolate on the fly. No speedup here, presumably because the improvement is marginal.
    And so on.

The purpose of the sequence 64->128->256 is to gradually learn weights. The sequence is shown by experiment to lead to better/faster training.

Is this correct? I suspect not as I see the “interpolation” taking lots of time.

If wrong, can you please correct me or else point to a definitive description of what happens? I suspect it’ll help many people.