[Solved] Using a fastai-trained model with plain Pytorch

Ok @simoneva, thanks for your reply.

Let’s see if we can go through the entire process.

The code you quoted above is a small subset of layers.py. If I’m not making mistakes, that should be the bare minimum needed to add the custom head a-la-fastai.

So, you instantiated a standard resnet from torchvision.models, and then added the head by calling head(). Did you append the head at the end of children list and then unpacked all of it (like I did in the example below)?

Then, you just loaded the weights in .pkl with torch.load(), correct?

I tried something much more trivial, that is:

mymodel=learn.model
modules=list(mymodel.children())
my_r50=nn.Sequential(*modules)

Thought that just recreating the fastai model via nn.Sequential would have given me a pure Pytorch model.
I was wrong.

OK I created learner as standard torchvision model with bespoke head which is in the file model.py so no references to fastai.

    learn = create_cnn(db,
                       arch=torchvision.models.vgg16_bn,
                       metrics=accuracy,
                       custom_head=model.head(nf),
                       callback_fns=[ShowGraph, partial(GradientClipping, clip=0.1), BnFreeze])

Did all the training using fastai. Then:

torch.save(learn.model, "local_vgg16_bn", pickle_module=dill)

When the model is loaded it looks for the head in model.py not fastai. Of course I could have copied the whole model from torchvision. In some ways that would be better if torchvision changes. However I had various models under test so was easier to do it this way.

Reason for not using standard fastai head was I had very long thin images which were too small for the pooling layer. With hindsight I probably could have just resized the images and used standard head. However once I got it working was extra work to change.

1 Like

your children includes the head which probably has a flatten or adaptivepooling layer defined by fastai which is not present. the unpickle will look for fastai.layers which is not installed. best not to add a fastai.layers module to your project as this will hide the real fastai. hence solution is to copy the offending layers to a separate module and use bespoke head.

1 Like

Ah ok! So the code listing above is your own model.py. I thought you were making reference to fastai’s model.py.

Well then, I’ll do some experiments and let you know! Thanks! :slight_smile:

Yes you have to find any fastai references and replace them with something that is available at prediction time.

1 Like

One other thing to watch is if you are using other aspects of fastai such as data transformations or databunches that read from folders. Did not apply in my case as my data was already formatted for prediction and when I first wrote it fastai did not have a lot of those things.

I think going forward easiest is to just use fastai at prediction stage and remove spacy. If you can’t install it for some reason then include it in your package.

I can’t. It is not a docker container that I can concoct as I like. It is an inference machine upon which I can just use the stuff already installed.

Besides, you are teaching me things which are interesting per se. :slight_smile:

Look at what happens. I slightly changed your code to adjust it to my needs.


from torch import Tensor
from torch import nn
import logging as log

class Lambda(nn.Module):
    "An easy way to create a pytorch layer for a simple `func`."
    def __init__(self, func):
        "create a layer that simply calls `func` with `x`"
        super().__init__()
        self.func=func

    def forward(self, x): return self.func(x)

def Flatten(full:bool=False)->Tensor:
    "Flatten `x` to a single dimension, often used at the end of a model. `full` for rank-1 tensor"
    func = (lambda x: x.view(-1)) if full else (lambda x: x.view(x.size(0), -1))
    return Lambda(func)

def myhead(nf, nc):
    return \
        nn.Sequential(        
            nn.Sequential(
                nn.AdaptiveAvgPool2d(1),
                nn.AdaptiveMaxPool2d(1)
                ),
            Flatten(),
            nn.BatchNorm1d(nf),
            nn.Linear(nf, 512),
            nn.ReLU(True),
            nn.BatchNorm1d(512),
            nn.Linear(512, nc),
        )

Note that I didn’t make it a python module. I just wrote that in a notebook cell, for experimenting. nc is the number of classes.

Then, I did:

import torchvision.models
mylearn=create_cnn(data,arch=torchvision.models.resnet50,
                   metrics=accuracy,
                   custom_head=myhead(4096, 3))

That created a resnet50 with a head identical to fastai’s.

Then:

modeltosave=mylearn.model
modeltosave.cpu()
torch.save(modeltosave, '/path/mymodel.pkl')

As you warned, it didn’t work: AttributeError: Can't pickle local object 'Flatten.<locals>.<lambda>'.
But it serializes the fastai’s Flatten which is identical to ours, so I cannot figure out why it doesn’t work for our Flatten (maybe @sgugger could answer this).

However, I installed dill, and then:

import dill
modeltosave=mylearn.model
modeltosave.cpu()
torch.save(modeltosave, '/path/mymodel.pkl', pickle_module=dill)

I received a warning: serialization.py:251: UserWarning: Couldn't retrieve source code for container of type Lambda. It won't be checked for correctness upon loading. "type " + obj.__name__ + ". It won't be checked "

And indeed, at inference time, it says:

path/site-packages/dill/_dill.py", line 474, in find_class
    return StockUnpickler.find_class(self, module, name)
AttributeError: Can't get attribute 'Lambda' on <module '__main__' from 'predictor.py'>

Mmhh… It seems it cannot serialize it.
Any suggestion? :thinking:

1 Like

It doesn’t look the same at all - ours doesn’t use lambda!

1 Like

Has to be in a module. When you unpickle it has to have the same name and be importable.

1 Like

Nope, the fastai Flatten layer doesn’t use a lambda function. If you want to pickle things, don’t use lambda functions :wink:

1 Like

True. Thanks.
Sorry, I got a lot of .py files open, some of them with the same function names.

Got it, thanks! (Although I ask myself why…)

Thanks. As strange as it is, I’ll abide by it…!

1 Like

Trouble with copying bits of fastai code is you miss the frequent updates! Now I remember the dill allows you to pickle lambda if you really want to…but I agree with sgugger that it is better to avoid.

Actually pickling objects is a bit thorny. A change in source code can prevent unpickling an older object. I am somewhat concerned that if you pickle a torchvision model there is possibility that torchvision source changes later. Safest is to just save the weights and replicate the model source.

I wonder if there any way to generate/save source code in raw text format directly from a model?

1 Like

And indeed I had troubles with pickle even without the lambdas. A problem to solve another time. For now, I dropped pickle.

Here is how to take a model trained with fastai and use it to predict on an inference box where just plain Pytorch (and torchvision) is present. And without pickle.
An awkward setting, but maybe other people could need it in future.

  1. Build a network which is an exact replica of fastai’s. In the example below, it is a resnet101. I took Flatten() and AdaptiveConcatPool2d() from fastai, while myhead() is modeled upon good old @simoneva’s example.
    Note that it takes number of features and number of classes as its arguments. Moreover, you have to inspect teh fastai model and manually fill the number of nodes in the function’s body for the fully connected.
## The code below gives you Flatten and the double Adaptive Pooling (from fastai), plus
## a viable head. Mind that you got to fill the number of FC's nodes manually
from torch import Tensor
from torch import nn
import logging as log
from typing import Optional # required for "Optional[type]"

class Flatten(nn.Module):
    "Flatten `x` to a single dimension, often used at the end of a model. `full` for rank-1 tensor"
    def __init__(self, full:bool=False):
        super().__init__()
        self.full = full

    def forward(self, x):
        return x.view(-1) if self.full else x.view(x.size(0), -1)

class AdaptiveConcatPool2d(nn.Module):
    "Layer that concats `AdaptiveAvgPool2d` and `AdaptiveMaxPool2d`." # from pytorch
    def __init__(self, sz:Optional[int]=None): 
        "Output will be 2*sz or 2 if sz is None"
        super().__init__()
        self.output_size = sz or 1
        self.ap = nn.AdaptiveAvgPool2d(self.output_size)
        self.mp = nn.AdaptiveMaxPool2d(self.output_size)
    def forward(self, x): return torch.cat([self.mp(x), self.ap(x)], 1)
    
def myhead(nf, nc):
    return \
    nn.Sequential(        # the dropout is needed otherwise you cannot load the weights
            AdaptiveConcatPool2d(),
            Flatten(),
            nn.BatchNorm1d(nf),
            nn.Dropout(p=0.375),
            nn.Linear(nf, 512),
            nn.ReLU(True),
            nn.BatchNorm1d(512),
            nn.Dropout(p=0.75),
            nn.Linear(512, nc),
        )
  1. Now we build the network. Note that prior to glue our head we need to cut out two unneeded layers (unneeded by fastai, which adds its more sophisticated head, in fact). Note also that all this stuff is double-enclosed into nn.Sequential()s. The asterisk unpacks things.
import torch
import torchvision

my_model=torchvision.models.resnet101() 
modules=list(my_model.children())
modules.pop(-1) 
modules.pop(-1) 
temp=nn.Sequential(nn.Sequential(*modules))
tempchildren=list(temp.children()) 
tempchildren.append(myhead(4096,2))
my_r101=nn.Sequential(*tempchildren)
  1. Ok, now the network is ready. Compare its [-1] and [-2] blocks (head and body, respectively) with fastai’s to check that all has gone in the right place.

  2. Open the fastai notebook you used for training and save the weights in the following unorthodox manner: torch.save({'state_dict': learner_name.model.state_dict()}, '/path/name.pth').
    This will save the state dictionary rather than the whole model, which is not a bad thing.

  3. Go back to your inference code and load these weights into the model we created in points 1,2 and 3.

model=my_r101
weighties = torch.load('/path/name.pth')
model.load_state_dict(weighties['state_dict'])
  1. Now you should be ready to predict, and what follows is just a stub:
from PIL import Image
import torchvision.transforms.functional as TTF
import torch
import numpy as np

img = '/your/image.jpg'
softmaxer = torch.nn.Softmax(dim=1)
model.eval()
image = Image.open(img)
x = TTF.to_tensor(image)
x.unsqueeze_(0)
print(x.shape)
raw_out = model(x)
out = softmaxer(raw_out)
print(out[0])

It should return the tensor it just processed and the array of the probabilities, e.g:

torch.Size([1, 3, 365, 490])
tensor([0.8540, 0.1460], grad_fn=<SelectBackward>)

@jeremy and @sgugger it would be great if you could briefly check this post and tell me if I could have done something in a better way.

Also I wonder:

  • Is there some trick in order not to fill the number of nodes and feature by hand? Like inferring it from the final output of resnet’s body…

  • Do we have some alternative way to serialize a whole model, other than pickle?

Thanks!

17 Likes

Your post was very helpful in getting fastai model to run on pytorch. I trained a 6 class image classifier using fastai and exported the model. I followed your recipe to get it running on pure Pytorch. I find that the model performance in Pytorch is worse than in fastai. I suspect it to be due to some transform issue.
The databunch I created is:
data = ImageDataBunch.from_folder(dataPath,train=’.’,valid_pct=0.2,ds_tfms=get_transforms(do_flip=True,flip_vert=True),size=(640,480),num_workers=1).normalize(imagenet_stats)
While using the Pytorch model I am using
transforms.Compose([transforms.ToTensor(), normalize,])
Am I missing something??

2 Likes

Very strange… I always had a slight perf improvement with plain pytorch.

Yes I think the transfs could be the most probable culprit. Keep us posted!

In my previous post I had annotated the dataBunch definition I used for training with fastai.
For inference with Pytorch I use:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
tfms2=transforms.Compose([transforms.ToTensor(), normalize,])

imgPath = join(mypath,onlyfiles[i])
im = Image.open(imgPath)
tfImg=tfms2(im)
tfImg=tfImg[None,:,:,:]
raw=model(tfImg)
out=F.softmax(raw,dim=1)
val,ix=torch.max(out,1)
pred= classes[ix],val.item()

Do you see any discrepancy??

1 Like

Hey balnazzar
Were you able to resolve this issue. Actually I am facing the same issue. Is it possible for you to share the code with us?
Thanks in advance

1 Like

Hi Samrat. What kind of problem you are exactly making reference at?

Thanks!

I was also getting worse accuracy in PyTorch. Turns out I was not normalizing the input correctly using data.stats.

I have a similar problem except that the inference machine in an iPhone.
The coremltools module converts the model and generates a model that I can install on iOS, but I get the following error at runtime.

BN: Invalid K dimension 1024 / 1 / 1 [Exception from Layer: 73: input.106]

Something like this is very difficult to debug and fix (yes, I know it’s possible to replace that layer and re-train but that nullifies the benefits that Fastai offers in terms of making it easy to train).

Fastai is a great library and ease of use is great, but I wish it didn’t introduce these bespoke layers that are slightly different than what pytorch offers.

1 Like