Looking for guidance on real-time object detection running on a pi or nano

wgpubs · February 1, 2020, 3:31am

New to object detection and running things on a raspberry pi or nano, so looking for any and all guidance and help.

For those of you with experience…

What tools are you using to label your datasets?
What kind of models are you using (Yolo, ssd, etc)?
How are you training these models? Especially with using fastai v1?

Anything else you may think helpful is appreciated. This is for a project that I’m helping a high school robotics team with.

muellerzr · February 1, 2020, 3:37am

I’ve used VOTT before:

In terms of what models, I did a competition once with Yolo (darknet) but I did not train it with fastai. That being said though, if you want to go that route, I have inference scripts on our repository

For training in v1, I have a repo here you can use & notebook:

https://github.com/muellerzr/Practical-Deep-Learning-For-Coders/blob/master/10_Object_Detection.ipynb it uses RetinaNet (ssd) (also heavily influenced by other fastai users)

Hope that helps some Let me know if you have any questions on that

wgpubs · February 1, 2020, 6:48pm

Man you are everywhere! Thanks for the thoughtful reply and info.

How did you train the yolo model?

We’re looking at yolo because its small and fast … and as this will be running on a pi or nano, that I think will be important. Currently we only have one object we are attempting to detect … a yellow ball, of which there will be dozens of on the game field.

Also, what yolo model would you recommend given our task and device constraints?

muellerzr · February 1, 2020, 6:56pm

For Yolo, we used pjreddies version of DarkNet (and using their training structure), not fastai.

I believe YoloTiny just released awhile ago, super fast.

https://pjreddie.com/darknet/tiny-darknet/

(Model is 4mb). I wound up setting it up on a p3 instance and training for what the recommended number of epochs was (I think 11,000), but for that simple of a problem a few thousand may be all that’s needed.

wgpubs · February 1, 2020, 11:00pm

It looks like Tiny just does classification???

Run the scripts and just see what breed it selects.

muellerzr · February 1, 2020, 11:07pm

@wgpubs you are totally right. (the naming is so similar.) What I wanted to say was YOLOv3-tiny. https://pjreddie.com/darknet/yolo/ It allows up to 220 fps

(on their baselines)

wgpubs · February 1, 2020, 11:19pm

Do you have an example of what your training set looks like? We’re trying to understand what x and y refer to and how to structure the dataset for training.

muellerzr · February 1, 2020, 11:22pm

Sure. Here is a pretty good example:

One moment

@wgpubs (sorry this is taking me a moment, it’s been about a year or so since we did the competition, its all coming back). The exact version we trained with was:

The readme goes into a very large amount of detail building your custom datasets

Also check your PM’s

ptd006 · February 2, 2020, 12:49pm

Hello,

A few practical comments:

Full YOLO in real time at non-trivial frame rate is difficult on pi and even Nano. How many fps are you expecting? I haven’t done it myself but I’ve seen 20fps YOLO tiny.
I also use AlexeyAB fork https://github.com/AlexeyAB/darknet
Creating your own high quality training sample is hard work but ultimately pays off.
You need to be efficient or possibly consider e.g. Amazon mechanical turk (I haven’t tried the latter but came close).
Have an initial fast process for review including crop or discard. Consider running through pretrained model or other heuristics to weed out inappropriate images.
I use labelImg ( https://github.com/tzutalin/labelImg ) and save in VOC XML format. (I was not initially using YOLO but stuck with VOC XML after I swapped over). Learn the hotkeys, use autosave, get fast at it. Quality does matter but pixel perfect bounding boxes aren’t super important IME.
Convert annotations to YOLO format ( https://github.com/ssaru/convert2Yolo ) - fast and easy.
Training on Google Colab with gdrive sync inspired by http://blog.ibanyez.info/blogs/coding/20190410-run-a-google-colab-notebook-to-train-yolov3-using-darknet-in/ but with tweaks. E.g. use the same /gdrive symlink both on my laptop and Colab, use rclone sync to make everything more seamless.
Trained weights sync’d back to laptop for test/play while the model is still being trained in Colab.
darknet CPU-only performance (I am mainly using a laptop with no GPU) is poor. Consider e.g. OpenCV for local play.

Hope that helps,
Peter

wgpubs · February 2, 2020, 6:29pm

Thanks for the writeup!

We’d love to get at least 60fps if not more. Do you think that would be possible if we capture images with a small resolution? Something like 640x480? We’re using a raspberry pi camera v2 module.

ptd006 · February 3, 2020, 10:02pm

I haven’t done much real time video testing sorry so not sure of the best optimisations. However I don’t think 60fps is remotely feasible even at 640x480 on a Nano (which after all is quite a cheap device and cost is like a 2nd hand GTX980). I guess 15fps tops, perhaps more with YOLO-tiny.

Is there a particular reason to use a Nano (e.g. size/power consumption)? Bang-for-buck wise you might be better off with a 2nd hand gaming laptop or build based on mini-ITX (I have an old mini-ITX z97/4590/GTX960/Corsair vengeance setup that I hope to use for my project- despite it’s age I think it would still blow a Nano out of the water just going by raw specs.)

wgpubs · February 3, 2020, 10:07pm

Yah this is going to be running on a FRC (First Robotics Competition) robot. There are both size and power constraints for everything the team is attempting to do. If you’re desperate for reading material, you can get an idea of what we’re working with here.

ptd006 · February 3, 2020, 10:21pm

Haha cool- I don’t have time to read now but might take a look. I suggest make a list of the basic constraints (power, space, minimum tolerable fps for your application, desired accuracy), how much time you can devote to this, what your coding ability is (determines whether you can do some custom optimisations etc).

I just saw
https://www.reddit.com/r/raspberry_pi/comments/c0pne5/custom_python_tiny_yolov3_running_on_jetson_nano/ - got 9fps on Nano with Pytorch .
https://devtalk.nvidia.com/default/topic/1051192/jetson-nano/yolov3-is-very-slow/ - 10fps YOLO tiny.
With some optimised C you could surely do better.

ssd-mobilenet might be a better bet but I don’t know much about it