Allow for more than one output for loss and metric

holli · October 22, 2018, 10:18pm

Thanks. (almost) Just what I needed.

I took the example a bit further. I wanted to have a bit more flexible way of handling in the Loss-class what to track and also to specify if we wan’t to track both train and validation or only validation.

gist.github.com

https://gist.github.com/holli/1623ee04f629b9453be527a12daa701c

fastai_multiloss.py

# Some magick to track multiple losses within fastai. If loss class inherits AvgMultiLoss
# then save the information to learn.recorder
#
# see https://forums.fast.ai/t/allow-for-more-than-one-output-for-loss-and-metric/21991/40
# and https://github.com/fastai/fastai_docs/blob/master/dev_nb/100_add_metrics.ipynb
#
# learn = fastai.Learner(data=databunch, model=model, loss_fn=HandsLoss(track_train=False))
# learn.callback_fns.append(HandleMultiLoss)
#
# class ExampleMultiLoss(AvgMultiLoss):

This file has been truncated. show original

cthnguyen · October 4, 2019, 3:40pm

Hello,

I have more or less the same issues as everyone on this topic.

I have 1 input with 4 channels. I have a segmentation problem. But I apply the Spatial Transformer Network in the 4th channel of the image, and I return the matrix of displacement and x. Now I have to optimize 2 ouputs. I more or less copy/paste the code from sgugger from his last post with the notebook. But I have now this error

/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:57: UserWarning: Traceback of forward call that caused the error:
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 450, in _handle_events
    self._handle_recv()
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 480, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 432, in _run_callback
    callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2828, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-41-cdff26b0dafb>", line 5, in <module>
    learn.fit_one_cycle(2, 3e-3, wd=0.4, div_factor=10, pct_start=0.8)
  File "/usr/local/lib/python3.6/dist-packages/fastai/train.py", line 22, in fit_one_cycle
    learn.fit(cyc_len, max_lr, wd=wd, callbacks=callbacks)
  File "/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py", line 202, in fit
    fit(epochs, self, metrics=self.metrics, callbacks=self.callbacks+callbacks)
  File "/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py", line 101, in fit
    loss = loss_batch(learn.model, xb, yb, learn.loss_func, learn.opt, cb_handler)
  File "/usr/local/lib/python3.6/dist-packages/fastai/basic_train.py", line 26, in loss_batch
    out = model(*xb)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "<ipython-input-31-ac0d26232299>", line 13, in forward
    alpha, theta = self.stn_model(alpha)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "<ipython-input-29-36dfe738bfd8>", line 52, in forward
    x, theta = self.stn(x) # shape (2, 10, 3, 3)
  File "<ipython-input-29-36dfe738bfd8>", line 46, in stn
    x = F.grid_sample(x, grid)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2656, in grid_sample
    return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-41-cdff26b0dafb> in <module>()
      3 # fastai.callback.CallbackHandler.on_loss_begin = custom_on_loss_begin
      4 learn.callback_fns.append(HandleDualLoss)
----> 5 learn.fit_one_cycle(2, 3e-3, wd=0.4, div_factor=10, pct_start=0.8)

5 frames
/usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     91     Variable._execution_engine.run_backward(
     92         tensors, grad_tensors, retain_graph, create_graph,
---> 93         allow_unreachable=True)  # allow_unreachable flag
     94 
     95 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [8, 1, 256, 256]] is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

bwarner · October 6, 2019, 12:54am

The example sgugger posted in Oct '18 doesn’t work with the current version of fast.ai, so I modified it to work with the current version:

class HandleDualLoss(LearnerCallback):
    _order = -20 #Needs to run before the recorder
    
    def on_train_begin(self, **kwargs):
        self.learn.recorder.add_metric_names(['train_loss1', 'train_loss2', 'val_loss1', 'val_loss2'])
    
    def on_epoch_begin(self, **kwargs):
        self.train_loss1, self.train_loss2, self.train_nums = 0., 0., 0
        self.val_loss1, self.val_loss2, self.val_nums = 0., 0., 0
    
    def on_batch_end(self, last_target, train, **kwargs):
        if train:
            bs = last_target.size(0)
            self.train_loss1 += bs * learn.loss_func.loss1.detach()
            self.train_loss2 += bs * learn.loss_func.loss2.detach()
            self.train_nums += bs
        else:
            bs = last_target.size(0)
            self.val_loss1 += bs * learn.loss_func.loss1.detach()
            self.val_loss2 += bs * learn.loss_func.loss2.detach()
            self.val_nums += bs
    
    def on_epoch_end(self, last_metrics, **kwargs):
        return add_metrics(last_metrics, 
                           [self.train_loss1/self.train_nums, self.train_loss2/self.train_nums, 
                            self.val_loss1/self.val_nums, self.val_loss2/self.val_nums])