I’m doing fp16 training. After training completes I call
learn.show_results(figsize=(12,15), rows = 10)
and I got and error
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-21-cc1e87ad044f> in <module>()
----> 1 learn.show_results(figsize=(12,15), rows = 10)
~/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in show_results(self, ds_type, rows, **kwargs)
278 ds = self.dl(ds_type).dataset
279 self.callbacks.append(RecordOnCPU())
--> 280 preds = self.pred_batch(ds_type)
281 *self.callbacks,rec_cpu = self.callbacks
282 x,y = rec_cpu.input,rec_cpu.target
~/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in pred_batch(self, ds_type, batch)
236 cb_handler = CallbackHandler(self.callbacks)
237 cb_handler.on_batch_begin(xb,yb, train=False)
--> 238 preds = loss_batch(self.model.eval(), xb, yb, cb_handler=cb_handler)
239 return _loss_func2activ(self.loss_func)(preds[0])
240
~/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in loss_batch(model, xb, yb, loss_func, opt, cb_handler)
16 if not is_listy(xb): xb = [xb]
17 if not is_listy(yb): yb = [yb]
---> 18 out = model(*xb)
19 out = cb_handler.on_loss_begin(out)
20
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same
from my understanding class MixedPrecision(Callback) sets the transformation on the start of training and then removes the transformation at the training end
class MixedPrecision(Callback):
...
def on_train_begin(self, **kwargs:Any)->None:
"Ensure everything is in half precision mode."
self.learn.data.train_dl.add_tfm(to_half)
if hasattr(self.learn.data, 'valid_dl') and self.learn.data.valid_dl is not None:
self.learn.data.valid_dl.add_tfm(to_half)
if hasattr(self.learn.data, 'test_dl') and self.learn.data.test_dl is not None:
self.learn.data.test_dl.add_tfm(to_half)
def on_train_end(self, **kwargs:Any)->None:
"Remove half precision transforms added at `on_train_begin`."
self.learn.data.train_dl.remove_tfm(to_half)
if hasattr(self.learn.data, 'valid_dl') and self.learn.data.valid_dl is not None:
self.learn.data.valid_dl.remove_tfm(to_half)
So after the training there is no default transformation to fp16. I’d like to fix that if somebody will point towards where is the best place to put a fix.