Chapter 6 of the course has an example of tuning a threshold for accuracy function in Multi-Label classification. It says:
changing the threshold in this case results in a smooth curve, so we’re clearly not picking some inappropriate outlier.
Why smooth relationship makes tuning a hyperparameter on validation appropriate?
The concern behind using the validation set for hyper parameter tuning comes from the possibility to find particular values that work well on the validation set but will not generalise to a new test set. This can happen is the loss function is very bumpy, because you might pick values at the bottom of the curve but the loss might be much higher on the test set.
If the relationship is smooth, then you know you have continuity and that even if the test set is slightly different from the validation set, the loss function will not increase dramatically.
Hope it help!