Geforce RTX = half speed FP32 accumulate (in mixed prec. mode)

crayoneater · December 20, 2018, 6:56am

Just learned that the Geforce RTX cards (not the Titan RTX) have half-speed FP32 accumulate, when running in FP16 mode: Ugh

Does anyone know if this is enough of a bottleneck to make mixed precision training not worth it? It seemed that way with the GTX 10-series, where FP16 itself was crippled at 1/64 speed. I don’t know enough about MPT to figure this out, and I can’t find any benchmarks. I was hoping to buy an RTX machine after finishing the fast.ai classes, thinking I’d have full MPT capability at my fingertips, but reality may prove different.

EDIT: There are some benchmarks here, but none of these have full “uncrippled” capability so it’s hard to see what the difference would be with the uncrippled Titan RTX.

crayoneater · December 24, 2018, 9:19pm

Well it appears the answer was staring me in the face the whole time. The Titan V card is actually not crippled, so comparing FP16 vs FP32, at least in the context of Resnet50, shows that the RTX 20’s lose roughly 15-20% performance on account of the half-speed MPT accumulate. While less than ideal, the gain in speed is still substantial enough to make MPT worth it, even on a crippled card.

yaysummeriscoming · December 28, 2018, 9:53am

Interesting, thanks for sharing! I’m not sure I’m following you 100% though, an operation like convolution is completely done in FP16 isn’t it?

crayoneater · December 28, 2018, 6:51pm

I don’t really understand MPT (I’m just trying to plan ahead for the future - still on part 1, lesson 3!) but I think the accumulation step is generally done in FP32 to avoid losing accuracy. Hopefully someone with a better understanding can correct if needed.