When the image is read in and converted to Tensor, it’s normalized to a range from 0-1 as you can see below. Hence the Data is from 0-1 whether you use use Imagenet_stats or not.
But when you apply Imagenet Stats, it’s normalized to mean of Imagenet image stats (([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])) that the Model was Trained in, so it adjusts the inputs to Mean 0 and Standard Dev 1.
But regardless, Network will still learn, with or without Normalization to imagenet stats since the inputs are in similar range 0-1 or -1 to 1. Hope this helps.
IMO, because the Train Loss > Val Loss in both experiments above. The Model is “underfit” at this point and Training Loss is going Down on each Epoch. It needs to run for few more Epochs before we can conclude that one is better than other.