Loading UNET model into C++ using the new Torch Script feature of Pytorch 1.0

I am trying the new feature from Pytorch (dev version) which enables loading a Pytorch model in C++ without any Python dependencies. I am using the tracing method:

import torch
import torchvision

# An instance of your model.
model = A UNET MODEL FROM FASTAI which has hooks as required by UNET

# An example input you would normally provide to your model's forward() method.
example = torch.rand(1, 3, 224, 224)

# Use torch.jit.trace to generate a torch.jit.ScriptModule via tracing.
traced_script_module = torch.jit.trace(model, example)

However, I got a value error:

ValueError: Modules that have hooks assigned can't be compiled

A brief look at the error trace:

   1118         if orig._backward_hooks or orig._forward_hooks or orig._forward_pre_hooks:
-> 1119             raise ValueError("Modules that have hooks assigned can't be compiled")
   1120 
   1121         for name, submodule in orig._modules.items():

I am using it for UNET MODEL from FASTAI. I was wondering if anyone else has tried it and would like to share their experiences if it worked. Also, does anyone if this new feature comes with a limitation of only working for very simple models?

4 Likes

I’m not sure about this - I’ve asked on twitter; hopefully a Pytorch guru will be able to help here. The issue is that Unet relies on a forward hook for the cross connections, and hooks aren’t currently supported by that new pytorch feature.

3 Likes

Will that be an issue for the library? Isn’t a lot of the functionality of v1 built around hooks?

2 Likes

I was wondering if we could try this feature with other fastai models as well. Its a really cool feature so would be nice to find a workaround for it.

1 Like

[quote=“shbkan, post:1, topic:23810”]
ValueError: Modules that have hooks assigned can't be compiled
[/quo
Hey,
has anyone allready tried to use torch.jit.trace() or something similar to a fastai model?
I am trying to wrap my ResNet of fastai. I am using the new fast.ai version 1 but I am getting some errors. So if somone was allready successful, I would like to know what I have to change to make it work.

I just used learn.model.cpu() to convert it and at least I did not get a error… I will try to make it work in c++ now

1 Like

Any update on this issue if someone else has tried it?

Here is a snippet of text I found in the following book that seems to provide the answer:

PyTorch Deep Learning Hands-On
By Sherin Thomas, Sudhanshu Passi
April 2019

PyTorch allows you to make a TorchScript IR through two methods. The easiest is by tracing, just like ONNX. You can pass the model (even a function) to torch.jit.trace with a dummy input. PyTorch runs the dummy input through the model/function and traces the operations while it runs the input.

The traced functions (PyTorch operations) then can be converted to the optimized IR, which is also called a static single assignment IR. Like an ONNX graph, instructions in this graph also have primitive operators that A TENsor library (ATen, the backend of PyTorch) would understand.

This is really easy but comes with a cost. Tracing-based inference has the basic problem ONNX had: it can’t handle the model structure changes that are dependent on the data, that is, an if / else condition check or a loop (sequence data). For handling such cases, PyTorch introduced scripting mode.

Scripting mode can be enabled by using the torch.jit.script decorator for normal functions and torch.jit.script_method for methods on the PyTorch model. By this decorator, the content inside a function/method will be directly converted to TorchScript. Another important thing to remember while using torch.jit.script_method for model classes is about the parent class. Normally, we inherit from torch.nn.Module , but for making TorchScript, we inherit from torch.jit.ScriptModule . This helps PyTorch to avoid using pure Python methods, which can’t be converted to TorchScript. Right now, TorchScript doesn’t support all Python features, but it has all the necessary features to support data-dependent tensor operations.

1 Like

Just to let you know, I managed to compiler and run a fastai DynamicUnet using torch.jit.
The model use hooks, but it seams the behavior is deterministic and induced by the shape of the dummy input. So I just commented pytorch’s code to force the algorithm to go on, and obtained a nice functional traced module.

It’s the line raise ValueError("Modules that have hooks assigned can't be compiled") that should be removed.

( The link to my SO question just in case https://stackoverflow.com/questions/56242857/how-can-i-force-torch-jit-trace-to-compule-my-module-by-ignoring-hooks )

Well, maybe it’s dirty, but at least you can compile it and run it in production :slight_smile:

4 Likes

I’ve been following an awesome DeOldify notebook and got the same error -
ValueError("Modules that have hooks assigned can't be compiled")
The reason for this was the tensorboard callback, which I didn’t need at the time.

So, commenting the
learn_gen.callback_fns.append(partial(ImageGenTensorboardWriter, base_dir=TENSORBOARD_PATH, name='GenPre')) line did the job for me.

Hope it can help someone who’s also trying this notebook’s approach.
Source: https://github.com/jantic/DeOldify

FYI, this has recently been fixed in pytorch to allow tracing through forward passes. Tracing through backwards passes will still throw the error though, which is only an issue for training, not inference, which I guess is what most people are trying to do with tracing with UNETs.

This PR solves it. Not sure when it will land in an official release or nightly, but it’s now in master.

Hi Jeremycochoy

I followed your approach to compile unet, and got an error. here is my code

learn = unet_learner(data, models.resnet34, metrics=dice, wd=wd).to_fp16()

# after training
learn.to_fp32()    # put back to full floating point

trace_input = torch.ones(1,3,599,599).cuda()
jit_model = torch.jit.trace(learn.model, trace_input)
model_file='unit_jit.pth'
torch.jit.save(jit_model, f'models/{model_file}')

I got following warning

/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai/vision/models/unet.py:32: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can’t record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if ssh != up_out.shape[-2:]:

and this error

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/fastai/layers.py in forward(self, x)
    170         self.dense=dense
    171 
--> 172     def forward(self, x): return torch.cat([x,x.orig], dim=1) if self.dense else (x+x.orig)
    173 
    174 def res_block(nf, dense:bool=False, norm_type:Optional[NormType]=NormType.Batch, bottle:bool=False, **conv_kwargs):

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 599 and 600 in dimension 2 at /opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THC/generic/THCTensorMath.cu:71

the shape is 599x600 instead of 599x599.

did you run into any issues when you compile your model?

Thanks
Dong

I updated the Pytorch to 1.2.0 and Fastai 1.0.57. Some changes in the source code so it is better to retrain all models.

The Pull request that @eugeneware showed is still not in the conda pytorch-gpu. Replace the init.py in …/anaconda3/envs/YOUR_ENV/lib/python3.7/site-packages/torch/jit from the PR.

It works well for me to export to jit and save pth file.

I think one way to solve @dzhang problem is change your tensor/ input image size is the multiples of 32.

Hi Vanlinhtnt

unfortunately I can’t change the input size. I am using transfer learning, the original model was trained on 599x599 input.

Thanks
Dong

Edit: Fixed, found the solution. The pickled learner was performing preprocessing (scaling by 8x) on the inputs when .predict was called, and I have to match that exactly when calling .trace() and when running .forward() on the new module. The nightly version of pytorch indeed lets this work on DynamicUnet.

I was able to get torch.jit.trace to “work” with the latest nightly build of Pytorch, but it uses over 40gb of memory to trace, and forward passes with the loaded model take a similar amount of memory. The DynamicUnet in plain fast.ai uses nowhere near that amount of RAM - does anyone have any idea on how I can get that memory usage fixed?

If I had to guess, the DynamicUNet is splitting up the input into tiles and using the raw model like this bypasses that somehow - but if so, how do I get TorchScript to do that?

This is based on this warning:

lib/python3.6/site-packages/fastai/vision/models/unet.py:31: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if ssh != up_out.shape[-2:]:

**Edit: Fixed, found the solution. The pickled learner was performing preprocessing (scaling by 8x) on the inputs when .predict was called, and I have to match that exactly when calling .trace() and when running .forward() on the new module. The nightly version of pytorch indeed lets this work on DynamicUnet.

The above warning is OK to ignore assuming inputs are always the same dimensions - need to double check outputs match exactly still.**

Hey @dzhang, I’m also facing the same issue. Did you find any working solution for this ?? And also I was able to plot training, valid losses and metrics but could not plot histograms for weights and other stuff.
When we check whether the stored tensor is of the same size as upsampled one before concatenation(i.e., if ssh != up_out.shape[-2:]: ) why the trace object wants to trace that boolean? this gives the error and the " if "statement is ignored, hence everyone is getting an error about dimension mismatch and its nothing to do with multiples of 32 I guess!
Is anyone facing same issue?

I I am trying to use torch.jit.script with UNet. However, is not working. If i use torch.jit.trace it works! Any idea?

For torch.jit.trace - be extra skeptical about it actually working. I’ve run into numerical issues that distort the colors noticeably in DeOldify. It’s also harder to make sure it’s hardware agnostic (running trace on CPU gets different results from GPU). So I’ve abandoned that and haven’t had those same issues with torch.jit.script .

For torch.jit.script, if you’re using DynamicUnet you’ll be out of luck- it won’t be able to deal with things like hooks. You’ll need to basically rewrite the Unet to be more friendly for Torchscript, something more like this: https://github.com/dana-kelley/DeOldify/blob/master/fasterai/generators.py

That’s the original DeOldify from 2 years ago. Notice that in AbstractUnet it doesn’t use hooks. That’s not the end of the story though- you can’t do the array access at runtime in the forward on the encoder either. So you have to extract those parts in the constructor of the Unet instead.

And wait- but that’s not all (LOL). You’ll also want to make the model backwards compatible with the fastai learner code so you’ll still want to present something like a Sequential model to the Learner. So what I’ve done is made a wrapper Module just for this purpose, that just pretends that the model is sequential by implementing getitem . Like this:

class SequentialScriptUnet(Module):
def __init__(self, encoder:nn.Module, nf_factor:int=4, leaky:float=0.01): 
    self.core = ScriptUnet(encoder=encoder, nf_factor=nf_factor, leaky=leaky)

def forward(self, x):
    return self.core(x)

def __getitem__(self,i): 
    switcher = {
        0: self.core.encoder,
        1: self.core.middle_conv,
        2: self.core.unetblock1,
        3: self.core.unetblock2,
        4: self.core.unetblock3,
        5: self.core.unetblock4,
        6: self.core.output
    }

    return switcher.get(i)
    
def append(self,l): 
    raise 'this aint right'
def extend(self,l):
    raise 'this aint right'
def insert(self,i,l):
    raise 'this aint right'

So basically, you’re creating a Unet model and training from scratch just to get something that’s compatible with torch.jit.script at the end. You extract the self.core model as your actual torchscript model.

It’s super hacky. I know. But it turns out straight up torchscript (as opposed to tracing) is really restrictive, yet it’s the way to go if you want something that actually works.

edit I forgot to mention that torch.jit.script will also complain about x.orig being accessed at MergeLayer, so you’ll need to implement those concats/residuals on your own rather than using SequentialEx.

5 Likes

I am playing with this right now, I should have searched the forums first (LOL).
What are the gains you see on perf? inference time? memory usage? training?
I will probably make a non Dynamic Unet once my model is finished, with the actual encoder.