Using fast ai object detection model for inference

I was looking for a way to do object detection with fast ai. I found the retinaNet notebook belonging to part 2 of the course and ran it.

everything was fine, until i tried using it for inference.i wanted to apply it on video from my webcam(in another code),but i stumbled upon an error. I tried adding the export, load and predict lines to the same training notebook and got the same error, which is the one below:


is it not possible to get predictions by doing learn.predict ? if so how do i get them?

ps: I understand that this is more suited to part2 2019 of the course, and i apologize. i cant access their forums now, and i am using this for my graduation project so i cant really wait until it the course is available on the internet. I thought part one was enough for object detection, since it has segmentation in it. i was wrong.

Update: i tried the following:

learn = load_learner(path=’.’,file=‘model.pkl’)
with torch.no_grad(): learn.model.eval()
input = Variable(image)
input = input.to(device)
z = learn(input)
show_preds(image, frame, z, detect_thresh=0.35, classes=learn.data.classes)

i got an error :
TypeError: ‘Learner’ object is not callable

and also tried this:
learn = load_learner(path=’.’,file=‘model.pkl’)
with torch.no_grad(): z=learn.model.eval()(image)
show_preds(image, frame, z, detect_thresh=0.35, classes=learn.data.classes)

and got this error:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 480, 640] instead

which shows that in this case its expecting a batch not a single image. is there a way to get around this?

Did you try:

with torch.no_grad():
output = learn.model(img)
1 Like

still the same error (in that same line) :
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 480, 640] instead

for anyone wanting to do this, i manged to get predictions like this:
defaults.device = torch.device('cuda')
encoder = create_body(models.resnet50, cut=-2)
model = RetinaNet(encoder,21,final_bias=-4)
state_dict = torch.load('stage2-256.pth')
model.load_state_dict(state_dict['model'],strict=False)
model = model.cuda()
model.eval()
with torch.no_grad():
z = model(image.unsqueeze_(0).cuda())

usually you use cpu for inference, but i m using webcam stream so i need it.
if you want cpu:
state_dict = torch.load(‘stage2-256.pth’, map_location = ‘cpu’)
and remove all the .cuda()s

Hi Mohamed,
Would you happen to have an exemplified notebook on how to implement an object detector via fastai? I am basically looking to convert my multi-label classifier to a multi object detector.

Thanks