Hi, Iām developing a smart camera trap that is able to recognise humans and different types of animals. This is going to be used in anti-poaching and bio-diversity projects. I already have a fast.ai model that is properly trained, now I want to run it on the Raspberry Pi 4. Iāve created two working solutions for inferencing, Python code below. Here are my findings:
- Inference with Fast.ai: 12 seconds per image
- Inference with PyTorch: 1.5 seconds per image
These results are using exactly the same model. As you can see there is a huge difference in performance. Iām probably doing some stuff wrong in the PyTorch code, for example the transformation / normalisation are just samples I grabbed from the internet. But still, can anybody elaborate on why fast.ai inferencing is so much slower and can you help in finding the best tradeoff between inferencing speed and accuracy? and ās are thanking you for your help
Fast.ai sample code:
from fastai.vision import *
import time
from PIL import ImageFile
learn = load_learner('', 'camera_trap_model.pkl')
image = open_image(sys.argv[1])
image = image.resize(400)
for _ in range(4):
start = time.time()
res = learn.predict(image)
print(res, time.time() - start)
Output on Rpi4
(Category Elephant_African, tensor(10), tensor([...])) 12.383612632751465
(Category Elephant_African, tensor(10), tensor([...])) 12.633440971374512
(Category Elephant_African, tensor(10), tensor([...])) 12.447107553482056
(Category Elephant_African, tensor(10), tensor([...])) 12.01491665840149
PyTorch sample code:
import torch
import sys
from torchvision import models
from PIL import Image
from pprint import pprint
import time
state = torch.load('camera_trap_model.pkl', map_location='cpu')
model = state.pop('model')
model.eval()
classes = ['Bird',
'Blank',
'Buffalo_African',
'Cat_Golden',
'Chevrotain_Water',
'Chimpanzee',
'Civet_African_Palm',
'Duiker_Blue',
'Duiker_Red',
'Duiker_Yellow_Backed',
'Elephant_African',
'Genet',
'Gorilla',
'Guineafowl_Black',
'Guineafowl_Crested',
'Hog_Red_River',
'Human',
'Leopard_African',
'Mandrillus',
'Mongoose',
'Mongoose_Black_Footed',
'Monkey',
'Pangolin',
'Porcupine_Brush_Tailed',
'Rail_Nkulengu',
'Rat_Giant',
'Rodent',
'Squirrel']
from torchvision import transforms
transform = transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)])
start = time.time()
image = Image.open(sys.argv[1])
print("loaded image", time.time() - start)
start = time.time()
img_t = transform(image)
print("transformed image", time.time() - start)
start = time.time()
batch_t = torch.unsqueeze(img_t, 0)
print("unsqueezed image", time.time() - start)
for x in range(4):
start = time.time()
out = model(batch_t)
_, index = torch.max(out, 1)
percentage = torch.nn.functional.softmax(out, dim=1)[0] * 100
print(classes[index[0]], percentage[index[0]].item(), time.time() - start)
Output on Rpi4
loaded image 0.016829729080200195
transformed image 0.39249253273010254
unsqueezed image 0.00021958351135253906
Elephant_African 93.3268051147461 1.7025566101074219
Elephant_African 93.3268051147461 1.545377254486084
Elephant_African 93.3268051147461 1.6150736808776855
Elephant_African 93.3268051147461 1.4821553230285645