Improve classification speed

Hello,

I am trying to use a CNN for classifying camera frames, recorded by a toy RC car as it drives. The streaming of the video to the laptop (where the CNN runs) is fast and almost lag free. Unfortunately classification is really slow and creates long lags, such that it is not usable.

My network is a simple one having this structure:

learn = cnn_learner(data, models.resnet18, pretrained=True, metrics=accuracy)

To load the trained network when in inference mode, I use load_learner() instead of the code above.

What is a realistic latency you can achieve on a CPU per image/frame in inference mode?

Two questions to consider. First, how often do you really need to classify a frame? Maybe try less frequent sampling of the video feed to see if it still gives you usable results. Second question is, have you considered using a smaller model? Larger ones will take longer.

Since I want to do transfer learning, how could I make the model smaller? I thought resnet18 was the smallest one already? I am a fastai beginner, so any hints on controlling the model’s size are appreciated.

And I need to query the model frequently, because it is used for driving the car.

There are some other pretrained models available. You can see a full list at: https://pytorch.org/docs/stable/torchvision/models.html

(Actually, FastAI supports a few more, but they are not smaller so I didn’t bother linking to them.)

You can also reduce the size by using smaller images. Classification accuracy will suffer, but perhaps it will be a worthwhile trade.

Thanks.
You mean creating the CNN with the same cnn_learner call as above, but just feeding it smaller images, would reduce the final model’s size? I thought the model’s size was fixed, and the images would be scaled automatically to fit its input layer.

I haven’t tested it out yet, but this link provides some interesting hints, especially also on adapting models with layers before. Also the image size used is much smaller:

Checking this link however, that says it’s slow to train, makes me assume it might be slow in inference as well:

The following is on intuition for choosing the right image size wrt. to recognition rate:

So in my case small images with low noise (maybe using some Gaussian filter) should work well, since I have pretty distinct lanes delimited by white paper.

Compression from the video streaming could be an issue, and that compression noise should be eliminated as much as possible. Flickering or changing brightness may be an issue as well. So I will have to find a way to test how well the images were stabilized and if a Gaussian filter is sufficient to filter out “grainy” images and compression artifacts.
Probably filtering is best applied before training, and before feeding it into the trained model (i.e. inferencing).

Also, it seems the input size for VGG16 is made for 224 x 224 RGB images, so no improvement there compared to resnet18.

A good discussion about many relevant details is here:

My understanding:
So CNNs will deal with any image size, as long as the kernel(s) can be applied (i.e., they should be at least as large as the kernel’s resolution and channel depth) and you have “summarizing” layers later, that can deal with any number of inputs.

Still have to see if fastai does any implicit resizing of input images.

The accuracy might suffer if image details/features are lacking that the pretrained model tries to extract. The question is if this could also cause misidentification of features that do not exist in the input image at all, or if it will simply not detect those features for which the details are lacking (which would be fine, since I only want to detect details that are clearly visible even at low resolution, i.e., the lane boundaries).

So image input size would affect the performance of a forward pass, even with an unchanged CNN model, because the kernel will slide across the entire image, and how many slide “steps” it takes depends on the image size.

1 Like