Impressed by the speed and performance of mixed precision training, I immediately tired to get my hand on it. Because Jeremy suggested that it only work with the latest Nvidia GPU driver, I installed the latest driver and updated CUDA 9 to CUDA 10 on my GCP V100 instance. Here is the screenshot from nvidia-smi.
However, when I run through the lesson3-camvid notebook with to_fp16() added to the learner as Jeremy did in the lecture video, the kernel still gets killed after the training starts. The memory is not the issue here as it did not even get above 20% usage when the kernel dies.
Thanks for the input. The issue here is not memory. Since half-precision floating point takes less memory than full-precision, V100 should not run out of memory, considering the fact that it did not even get close to full memory usage even when using full-precision floating point. In fact, when the kernel gets killed, there is still ample amount of memory left.