Hi,
I’m currently training a Language Model from Wikipedia with fast.ai v1. I’m using 50k tokens, just like I did with v0.7 and everything else is as similar as I could make it. The main difference is I’m using a bs of 64 and previously used 32 but that should only speed things up, not make it slower. One epoch takes an estimated 15h whereas before it took about 2h40m. I feel like I must miss something very obvious so any help is appreciated. I’ve checked that the GPU is actually being used.
torch.cuda.current_device()
0
torch.cuda.device(0)
<torch.cuda.device at 0x2aa1d1857b8>
torch.cuda.device_count()
1
torch.cuda.get_device_name(0)
'Quadro P6000'
Tue Mar 12 14:33:05 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.96 Driver Version: 418.96 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P6000 WDDM | 00000000:65:00.0 On | Off |
| 36% 71C P0 78W / 250W | 6376MiB / 24576MiB | 83% Default |
+-------------------------------+----------------------+----------------------+
Edit: here’s the nvidia-smi dmon output during training (sm goes up to 90ish so everything seems ok):
# gpu pwr gtemp mtemp sm mem enc dec mclk pclk
# Idx W C C % % % % MHz MHz
0 207 52 - 36 24 0 0 4513 1569
0 220 51 - 66 43 0 0 4513 1632
0 68 52 - 63 43 0 0 4513 1569
0 185 53 - 43 27 0 0 4513 1569
0 183 53 - 68 47 0 0 4513 1645
0 69 54 - 52 39 0 0 4513 1569
0 209 53 - 53 32 0 0 4513 1632
0 135 53 - 78 48 0 0 4513 1544
0 156 54 - 36 26 0 0 4513 1544
0 136 54 - 74 50 0 0 4513 1632
0 67 54 - 54 39 0 0 4513 1544
0 203 55 - 41 22 0 0 4513 1556
0 179 55 - 37 27 0 0 4513 1556