Need Advice on Improving Real-Time performance

Hello,
I am relatively new to machine learning and am using fastai to train a pretrained model to recognize and classify hand gestures. I used the resnet18 for this one.( I also tried vgg16 for another one, but that caused a lot of lag in the video stream)

I trained the images on this dataset I found on Kaggle(https://www.kaggle.com/grassknoted/asl-alphabet)
Well one thing that happened was that the borders get rotated when the images are being augmented and I feel that contributed to inaccuracy.

But when I tested the model on other images, it was returning results of over 99.4% accuracy.

Eventually I went and set up the code to view the response over live stream and this is where i’m failing to get results:( There’s a small mistake where I have an empty string in "if handsy.predict(cop)[0], but I’ve run the code after correcting it and the problem is still the same)

The responses that I’m seeing get printed are essentially the wrong ones, with a few correct predictions. I was hoping to get some advice on what I can do for better real-time accuracy.
Is this because the model was not trained to detect gestures on multiple backgrounds or is it because I should use a different model for gesture recognition?

Thanks in advance!