[Solved] Using a fastai-trained model with plain Pytorch

sgugger · April 4, 2019, 10:37pm

Nope, the fastai Flatten layer doesn’t use a lambda function. If you want to pickle things, don’t use lambda functions

balnazzar · April 4, 2019, 10:41pm

True. Thanks.
Sorry, I got a lot of .py files open, some of them with the same function names.

Got it, thanks! (Although I ask myself why…)

Thanks. As strange as it is, I’ll abide by it…!

simoneva · April 5, 2019, 8:35am

Trouble with copying bits of fastai code is you miss the frequent updates! Now I remember the dill allows you to pickle lambda if you really want to…but I agree with sgugger that it is better to avoid.

Actually pickling objects is a bit thorny. A change in source code can prevent unpickling an older object. I am somewhat concerned that if you pickle a torchvision model there is possibility that torchvision source changes later. Safest is to just save the weights and replicate the model source.

I wonder if there any way to generate/save source code in raw text format directly from a model?

balnazzar · April 5, 2019, 7:08pm

And indeed I had troubles with pickle even without the lambdas. A problem to solve another time. For now, I dropped pickle.

Here is how to take a model trained with fastai and use it to predict on an inference box where just plain Pytorch (and torchvision) is present. And without pickle.
An awkward setting, but maybe other people could need it in future.

Build a network which is an exact replica of fastai’s. In the example below, it is a resnet101. I took Flatten() and AdaptiveConcatPool2d() from fastai, while myhead() is modeled upon good old @simoneva’s example.
Note that it takes number of features and number of classes as its arguments. Moreover, you have to inspect teh fastai model and manually fill the number of nodes in the function’s body for the fully connected.

## The code below gives you Flatten and the double Adaptive Pooling (from fastai), plus
## a viable head. Mind that you got to fill the number of FC's nodes manually
from torch import Tensor
from torch import nn
import logging as log
from typing import Optional # required for "Optional[type]"

class Flatten(nn.Module):
    "Flatten `x` to a single dimension, often used at the end of a model. `full` for rank-1 tensor"
    def __init__(self, full:bool=False):
        super().__init__()
        self.full = full

    def forward(self, x):
        return x.view(-1) if self.full else x.view(x.size(0), -1)

class AdaptiveConcatPool2d(nn.Module):
    "Layer that concats `AdaptiveAvgPool2d` and `AdaptiveMaxPool2d`." # from pytorch
    def __init__(self, sz:Optional[int]=None): 
        "Output will be 2*sz or 2 if sz is None"
        super().__init__()
        self.output_size = sz or 1
        self.ap = nn.AdaptiveAvgPool2d(self.output_size)
        self.mp = nn.AdaptiveMaxPool2d(self.output_size)
    def forward(self, x): return torch.cat([self.mp(x), self.ap(x)], 1)
    
def myhead(nf, nc):
    return \
    nn.Sequential(        # the dropout is needed otherwise you cannot load the weights
            AdaptiveConcatPool2d(),
            Flatten(),
            nn.BatchNorm1d(nf),
            nn.Dropout(p=0.375),
            nn.Linear(nf, 512),
            nn.ReLU(True),
            nn.BatchNorm1d(512),
            nn.Dropout(p=0.75),
            nn.Linear(512, nc),
        )

Now we build the network. Note that prior to glue our head we need to cut out two unneeded layers (unneeded by fastai, which adds its more sophisticated head, in fact). Note also that all this stuff is double-enclosed into nn.Sequential()s. The asterisk unpacks things.

import torch
import torchvision

my_model=torchvision.models.resnet101() 
modules=list(my_model.children())
modules.pop(-1) 
modules.pop(-1) 
temp=nn.Sequential(nn.Sequential(*modules))
tempchildren=list(temp.children()) 
tempchildren.append(myhead(4096,2))
my_r101=nn.Sequential(*tempchildren)

Ok, now the network is ready. Compare its [-1] and [-2] blocks (head and body, respectively) with fastai’s to check that all has gone in the right place.
Open the fastai notebook you used for training and save the weights in the following unorthodox manner: torch.save({'state_dict': learner_name.model.state_dict()}, '/path/name.pth').
This will save the state dictionary rather than the whole model, which is not a bad thing.
Go back to your inference code and load these weights into the model we created in points 1,2 and 3.

model=my_r101
weighties = torch.load('/path/name.pth')
model.load_state_dict(weighties['state_dict'])

Now you should be ready to predict, and what follows is just a stub:

from PIL import Image
import torchvision.transforms.functional as TTF
import torch
import numpy as np

img = '/your/image.jpg'
softmaxer = torch.nn.Softmax(dim=1)
model.eval()
image = Image.open(img)
x = TTF.to_tensor(image)
x.unsqueeze_(0)
print(x.shape)
raw_out = model(x)
out = softmaxer(raw_out)
print(out[0])

It should return the tensor it just processed and the array of the probabilities, e.g:

torch.Size([1, 3, 365, 490])
tensor([0.8540, 0.1460], grad_fn=<SelectBackward>)

@jeremy and @sgugger it would be great if you could briefly check this post and tell me if I could have done something in a better way.

Also I wonder:

Is there some trick in order not to fill the number of nodes and feature by hand? Like inferring it from the final output of resnet’s body…
Do we have some alternative way to serialize a whole model, other than pickle?

Thanks!

tapashettisr · September 3, 2019, 12:20pm

Your post was very helpful in getting fastai model to run on pytorch. I trained a 6 class image classifier using fastai and exported the model. I followed your recipe to get it running on pure Pytorch. I find that the model performance in Pytorch is worse than in fastai. I suspect it to be due to some transform issue.
The databunch I created is:
data = ImageDataBunch.from_folder(dataPath,train=’.’,valid_pct=0.2,ds_tfms=get_transforms(do_flip=True,flip_vert=True),size=(640,480),num_workers=1).normalize(imagenet_stats)
While using the Pytorch model I am using
transforms.Compose([transforms.ToTensor(), normalize,])
Am I missing something??

balnazzar · September 4, 2019, 10:28am

Very strange… I always had a slight perf improvement with plain pytorch.

Yes I think the transfs could be the most probable culprit. Keep us posted!

tapashettisr · September 4, 2019, 11:05am

In my previous post I had annotated the dataBunch definition I used for training with fastai.
For inference with Pytorch I use:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
tfms2=transforms.Compose([transforms.ToTensor(), normalize,])

imgPath = join(mypath,onlyfiles[i])
im = Image.open(imgPath)
tfImg=tfms2(im)
tfImg=tfImg[None,:,:,:]
raw=model(tfImg)
out=F.softmax(raw,dim=1)
val,ix=torch.max(out,1)
pred= classes[ix],val.item()

Do you see any discrepancy??

samrat.sah28 · October 26, 2019, 10:29am

Hey balnazzar
Were you able to resolve this issue. Actually I am facing the same issue. Is it possible for you to share the code with us?
Thanks in advance

balnazzar · October 27, 2019, 5:18pm

Hi Samrat. What kind of problem you are exactly making reference at?

Thanks!

tejasvi88 · April 27, 2020, 4:10pm

I was also getting worse accuracy in PyTorch. Turns out I was not normalizing the input correctly using data.stats.

rawmean · December 31, 2020, 10:24pm

I have a similar problem except that the inference machine in an iPhone.
The coremltools module converts the model and generates a model that I can install on iOS, but I get the following error at runtime.

BN: Invalid K dimension 1024 / 1 / 1 [Exception from Layer: 73: input.106]

Something like this is very difficult to debug and fix (yes, I know it’s possible to replace that layer and re-train but that nullifies the benefits that Fastai offers in terms of making it easy to train).

Fastai is a great library and ease of use is great, but I wish it didn’t introduce these bespoke layers that are slightly different than what pytorch offers.