Why the gap between validation vs (unclipped) test loss?

Why are validation losses consistently lower than the test set loss (before clipping)? Maybe another way to ask this is: why is clipping necessary for the test set in order to bring it more in line? I’m seeing this from my own experiments and from others’ results (mainly focused so far on Dogs Vs Cats Redux). E.g. in Jeremy’s lesson 2 video as well, at 30:00, you can see the gap between the validation and test loss. In some scenarios this can be attributed to hyperparameter overfitting from cross-validation, but there seems to be a consistent gap even when not doing any hyperparameter fitting. Thanks!