I am currently getting a warning which I have never encountered before when using the ‘to_fp16’ function
/home/me/anaconda3/envs/fastai-usr/lib/python3.7/site-packages/fastai/callbacks/fp16.py:97: UserWarning: You have a `loss_scale` factor that is too high, try to divide it by 2 (current value: 512).
warn(f"You have a `loss_scale` factor that is too high, try to divide it by 2 (current value: {self.loss_scale}).")
I am using Fastai v1.0.45. I created a language model learner using language_model_learner(data_lm, AWD_LSTM, drop_mult=0.3) and call to_fp16 on it. Then when I run lr_find, this warning prints constantly. Changing the loss_scale parameter of to_fp16 doesn’t seem to have any effect on this warning, as I set it as low as .125 and still got the warning.
It doesn’t seem to have any effect on the results of lr_find, as I get similar results to similar code run on the same data in previous versions of fastai, but I’m not sure if it’s something I should be concerned with.
I am seeing this same error when doing fit_one_cycle as part of using a unet_learner after 6 of 10 epochs.
/opt/anaconda3/lib/python3.7/site-packages/fastai/callbacks/fp16.py:97: UserWarning: You have a loss_scale factor that is too high, try to divide it by 2 (current value: 512).
warn(f"You have a loss_scale factor that is too high, try to divide it by 2 (current value: {self.loss_scale}).")
Learning rate and loss were both low before the error:
lr=7e-4
epoch
train_loss
valid_loss
dice
dice
time
0
0.105225
0.108841
0.700169
0.800594
05:27
6
0.094356
0.097013
0.704540
0.815099
05:23
This is a GCP instance.
=== Software ===
python : 3.7.1
fastai : 1.0.47.post1
fastprogress : 0.1.20
torch : 1.0.1.post2
nvidia driver : 410.72
torch cuda : 10.0.130 / is available
torch cudnn : 7402 / is enabled
=== Hardware ===
nvidia gpus : 1
torch devices : 1
- gpu0 : 15079MB | Tesla T4
=== Environment ===
platform : Linux-4.9.0-8-amd64-x86_64-with-debian-9.8
distro : #1 SMP Debian 4.9.130-2 (2018-10-27)
conda env : base
python : /opt/anaconda3/bin/python
sys.path :
/opt/anaconda3/lib/python37.zip
/opt/anaconda3/lib/python3.7
/opt/anaconda3/lib/python3.7/lib-dynload
/opt/anaconda3/lib/python3.7/site-packages
Sat Mar 9 16:44:06 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.72 Driver Version: 410.72 CUDA Version: 10.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |
| N/A 56C P0 28W / 70W | 2553MiB / 15079MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 17757 C /opt/anaconda3/bin/python 2543MiB |
+-----------------------------------------------------------------------------+
I get the warning occasionally now, but what it often means is that the model’s loss has diverged and it needs to be set to a lower learning rate. Doesn’t look like that’s what’s happening in your case?
After I increase the batch size to a certain degree I’m getting the same warning while trying to do lr_find on a u-net with fp_16(). Any clues why this happens? I’m running fastai 1.0.48