Test Time Augmentation I believe happens whenever you ask the model to do prediction on test (or production data), so that it get’s a better chance to predict.
So, is it not something you do when you move your model and weights to production, but when you have ready done that, and a request comes in for image classification.
I think its not very easy to perform image type data argumentation to Time Series data and adding noise will defeat the purpose of data argumentation. However, if data is seasonal, we can use something like take average of current value and value at t - 52, t + 52 , so on and so forth for weekly dataset. This is just an idea, as I have to still evaluate this
For multi-class classification, we use softmax function to convert the raw scores to pseudo probabilities (so that all class probabilities add up to one for an image)
So the operative word there is pseudo, softmax does not give you pure probabilities as taught in probability theory.
We have cycle_mult=2 which means that we have 3 cycles and every cycle number of epoch in the cycle will double. That’s why we have 7 epochs = 1 +2 + 4
My understanding is that it doesn’t preserve all the activations but only last layer including augment images. The Neural network model will directly use the activations. It eliminates recomputation for same data as frozen layers will not create different activation. Just speeds up computation in multi epoch situation.
I have seen that too, validation loss lower than training loss. My hypothesis was that the data in the validation set (in this case) was very well learned by the model, hence it could do well - better than the larger training set. Probably if we shuffle it better (train validation split), we might see different results.
How to pick learning rate and other hyper-parameter when the data size is too large. Like in kaggle cdiscount image classification challenge the training size is 12 Million images and one epoch takes 22 hrs on my moderate size gpu?