Hi!
I think your understanding of presizing is correct. As described in the book, it’s a specific technique/strategy to avoid ”spurious empty zones, degrade data, or both” when doing various data augmentation transforms.
So exactly as you wrote, we first resize the images to a larger dimension (item_tfms). This happens to each image before it is copied to the GPU. Then apply the augmentation transforms to a batch in one go on the GPU (batch_tfms).
Regarding TTA, my understanding is that it’s a method to try to increase the accuracy of an already trained model and can be compared to the principle of ensemble techniques. So instead of just getting the prediction of one image you take the predictions of multiple images that are slightly different (different because you apply cropping data augmentation to them) and then take the average (or maximum) out of those predictions. And hopefully, this aggregated prediction can give a better accuracy than that of a single prediction.
I think @dhruv.metha also have a really nice explanation of TTA:
Hope that helps!