I'm not sure what the best way to benchmark things would be and I haven't been using VGG16 much (always at least VGG19 or one of the google architectures) with at least two dense or convolutional trainable layers at the end.
Those are usually ~240 s on the dogs vs cats dataset per epoch.
Multiple GPUs hasn't been a huge advantage for training individual models, but it does let me train two things at once which is great. (I'm sure with a bit of optimization I could do better, but there is either enough overhead from splitting data or the model across GPUs or my CPU isn't preprocessing data fast enough to keep both GPUs busy).
I'm running my scripts on docker containers (building from the Dockerfile and Makefile distributed with Keras) so it is very easy to isolate a container to one gpu or the other.
E.g. Run a script in one on the first gpu and host my notebook from another on the second gpu.
All you have to do is set the NV_GPU environment variable before calling nvidia-docker.
We probably should come up with a standard benchmarking script to compare builds - I'm sure other people have better setups than me.