Thing is that you should not have any driver both in Validation set and Training set. Not having same image in both sets is not sufficient.
The idea behind is that if same driver (with different distraction) will appear in both sets, it will be “easier” for the trained model to predict the same driver in the validation - even if distraction is different. If you separate competently the driver in both sets, you make sure the trained model is able to correctly predict a driver that it never saw before.