Using fast ai object detection model for inference

I was looking for a way to do object detection with fast ai. I found the retinaNet notebook belonging to part 2 of the course and ran it.

everything was fine, until i tried using it for inference.i wanted to apply it on video from my webcam(in another code),but i stumbled upon an error. I tried adding the export, load and predict lines to the same training notebook and got the same error, which is the one below:

is it not possible to get predictions by doing learn.predict ? if so how do i get them?

ps: I understand that this is more suited to part2 2019 of the course, and i apologize. i cant access their forums now, and i am using this for my graduation project so i cant really wait until it the course is available on the internet. I thought part one was enough for object detection, since it has segmentation in it. i was wrong.

Update: i tried the following:

learn = load_learner(path=’.’,file=‘model.pkl’)
with torch.no_grad(): learn.model.eval()
input = Variable(image)
input =
z = learn(input)
show_preds(image, frame, z, detect_thresh=0.35,

i got an error :
TypeError: ‘Learner’ object is not callable

and also tried this:
learn = load_learner(path=’.’,file=‘model.pkl’)
with torch.no_grad(): z=learn.model.eval()(image)
show_preds(image, frame, z, detect_thresh=0.35,

and got this error:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 480, 640] instead

which shows that in this case its expecting a batch not a single image. is there a way to get around this?

Did you try:

with torch.no_grad():
output = learn.model(img)
1 Like

still the same error (in that same line) :
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 480, 640] instead

for anyone wanting to do this, i manged to get predictions like this:
defaults.device = torch.device('cuda')
encoder = create_body(models.resnet50, cut=-2)
model = RetinaNet(encoder,21,final_bias=-4)
state_dict = torch.load('stage2-256.pth')
model = model.cuda()
with torch.no_grad():
z = model(image.unsqueeze_(0).cuda())

usually you use cpu for inference, but i m using webcam stream so i need it.
if you want cpu:
state_dict = torch.load(‘stage2-256.pth’, map_location = ‘cpu’)
and remove all the .cuda()s

Hi Mohamed,
Would you happen to have an exemplified notebook on how to implement an object detector via fastai? I am basically looking to convert my multi-label classifier to a multi object detector.


I have tried doing transfer learning for my model when I try to load the previously model for my new data, it gives me this error:

/usr/local/lib/python3.7/dist-packages/fastai/ in load(self, file, device, strict, with_opt, purge, remove_module)
    271             model_state = state['model']
    272             if remove_module: model_state = remove_module_load(model_state)
--> 273             get_model(self.model).load_state_dict(model_state, strict=strict)
    274             if ifnone(with_opt,True):
    275                 if not hasattr(self, 'opt'): self.create_opt(, self.wd)

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/ in load_state_dict(self, state_dict, strict)
   1496         if len(error_msgs) > 0:
   1497             raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
-> 1498                                self.__class__.__name__, "\n\t".join(error_msgs)))
   1499         return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for RetinaNet:
    size mismatch for classifier.3.weight: copying a param with shape torch.Size([3, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([2, 128, 3, 3]).
    size mismatch for classifier.3.bias: copying a param with shape torch.Size([3]) from checkpoint, the shape in current model is torch.Size([2]).

The images in data loader are not binary they are 3 channel images, but still the learn. load throws this error. The new data has 2 slide images whereas the model was trained on 100 slides. I have loaded the new data like this:

do_flip = True
flip_vert = True 
max_rotate = 90 
max_zoom = 1.1 
max_lighting = 0.2
max_warp = 0.2
p_affine = 0.75 
p_lighting = 0.75 

tfms = get_transforms(do_flip=do_flip,
train, valid = ObjectItemListSlide(train_images) ,ObjectItemListSlide(valid_images)
item_list = ItemLists(".", train, valid)
lls = item_list.label_from_func(lambda x: x.y, label_cls=SlideObjectCategoryList)
lls = lls.transform(tfms, tfm_y=True, size=patch_size)
data = lls.databunch(bs=batch_size, collate_fn=bb_pad_collate,num_workers=0).normalize()

Does anyone know how one can solve this issue, what needs to be changed?