TL;DR: Don’t get suckered by AWS and Volta. The Amazon P3 instances (available Oregon) feature the latest DL GPU tech at $3.06 an hour (2xlarge) but PyTorch, TF, et al can’t utilize it fully yet.
I know we talk a lot about how AWS is behind the power curve, but it looks like they finally updated to the newest generation of NVIDIA Tesla products. These include tensor cores built to maximize fp-16/fp-8 operations (common in DL contexts) at 120 TFLOPS.
However, all of the common frameworks use fp32 for computation. Here the V100 Tesla only provides 15 TFLOPS of compute power (for comparison, a 1080TI, the consumer product normally recommended, caps at 11 TFLOPS @ fp-32 (fp-64,16,8 are intentionally hamstrung for product segmentation).
In the V100’s favor, it features 16 gigs of ram, so if you absolutely need more power and more memory, (and can afford it), spin one of these up. It’s also more powerful than the P2 instance (using K80, 5.5 - 8 TFLOPS @ fp32).
Unfortunately, PyTorch does not currently allow a setting to change between fp32 and fp16 math (although it is supported by CUDA). Perhaps in the near future:
V100 Volta Stats