👐 Training a Network to Detect Hands

tham · September 13, 2017, 3:06am

You can train your network by tiny-cnn/dlib directly if you like, I have trained a car vs non-car classifier by tiny-cnn before, and use dlib to participate dog vs cat competition, both of them are sufficient to describe your network architectures even they are not as mature as another deep learning frameworks like PyTorch, TensorFlow nor Keras. Unfortunately, compare with Python, deep learning libraries of c++ are far from mature(lack of features).

Another solution is import the model by opencv3.3 like I mentioned before, since you are target on wins and mac ox, binary size and performance should not be an issue.

noio · September 13, 2017, 9:00am

I thought about it, but I really want to benefit from transfer learning using a few layers of VGG or Mobilenet in my network, which worked really well so far! I do not have a huge amount of training data: experimenting with pretrained networks and finetuning the network in keras/pytorch seems more efficient to me.

If only I could fast-forward 6 months and ONNX was already supported by every framework

I did not know about this, thank you! That actually looks promising since I’m already including OpenCV3.X in my app! OpenCV-DNN is what you are using in your app, right? How is performance?

Thank you for the offer! So far there isn’t much of a “project” yet, I’m mostly doing the research to see if I can leverage the power of DNN’s in games.

And then you mean Lua torch, not pytorch, correct? Perhaps I can export from pytorch to a caffe model and import that in OpenCV-DNN…

tham · September 13, 2017, 11:42am

Correct, I really like the api and flexibility of PyTorch(http://pytorch.org/), but opencv3.3 do not support PyTorch(http://pytorch.org/) but pytorch(GitHub - hughperkins/pytorch: Python wrappers for torch and lua), you can study more details in #9501.

I do not like caffe nor tensorflow but I would prefer them if I want to deploy my model, because I can’t find a better choice.

Right, opencv dnn is what I use, you can check the source codes(ssd_detector.cpp, less than 90 lines of codes), if there are anything you do not understand, I am willing to to help.

I think performance is quite good, SSDMobileNet can run 70ms per frame on my laptop(Y410P) without using gpu(I link with the opencv compile without cuda support and disable opencl from the beginning, but I build with eigen and tbb support), however, dnn module do eat a lot of cpu power(80~90% cpu usage).

Hope you success, if you want to find someone to cooperate and you think my skills may help you, please do not hesitate to send me a message, I am eager to earn more experiences of computer vision project.

machinethink · September 13, 2017, 1:13pm

I don’t know what the desired target platform is, but to give another datapoint: MobileNet+SSD runs at 30 FPS on iPhone 7 using the GPU (latency is about 50ms in that case). If you want good performance on mobile, OpenCV (or even TensorFlow) is way slower than a native solution.

noio · September 13, 2017, 2:34pm

What framework is that? By “native” do you mean with CoreML?

My target platform is probably an Intel NUC, or another portable PC that runs the tracking server and sends found objects to a different PC running a Unity game (over OSC).

machinethink · September 13, 2017, 2:50pm

That is using Metal. Core ML tends to be slower than Metal in my experience. Doing it in OpenGL would also work, if you had a fast implementation for the convolution layers.

For the Intel NUC you might want to look into Intel’s clDNN library https://github.com/01org/cldnn – although it might not work on the GPU inside the NUC. But at least OpenGL should work on it.

tham · September 13, 2017, 3:29pm

I think your comparison is unfair.

You are comparing iphone7 with my laptop
You are comparing gpu vs pure cpu(opencv dnn support gpu too)

By the way, MobileNetSSD is much more complicated than the network architecture @noio want to build.

Performance of opencv dnn(3.3)

SqueezeNet only took 4.96ms per frame on cpu, I guess this is more than enough for @noio app.

opencv dnn already implement the network with clDNN

noio · September 13, 2017, 5:23pm

I didn’t interpret it as a direct comparison, just “another datapoint”.

I also think (and hope) that I can find a CPU based solution that will suit my needs. As you said, what I need is much less than a full SSD. As you can see in one of my first post, I only used 8 Conv layers with good results.

Additionally, ideally I’d like the executable to be somewhat portable, so including opencv-dnn or tiny-dnn is ideal in that case (instead of relying on a platform specific clDNN / Metal ) Obviously, the more you make use of a platform’s characteristics, the faster a solution will be. Exploiting Intel graphics is faster than running on a CPU. And exploiting a CPU’s BLAS/AVX will be faster than a naive solution.

tham · September 13, 2017, 7:18pm

In this case it make sense, @machinethink mentioned that too, my bad, misunderstand his intentions, sorry for the confusion.

You can compile your apps as different binary if needed. Since you are using unity, this will be a 3D game?

machinethink · September 14, 2017, 12:25pm

Yeah, it wasn’t my intention to compare your results to mine, since we’re using totally different computers. Sorry if that wasn’t clear

I only wanted to point out that something like OpenCV on iOS won’t have the same performance as Metal. On the desktop it’s a totally different story.

Unfortunately, for mobile devices there isn’t a cross-platform solution right now, except for writing your own kernels in OpenGL. (Unless there already is an OpenGL-based DNN library?)

tham · September 14, 2017, 5:25pm

Maybe you already did this, I think you can augment your data by

1 : Separate humans from background
2 : Put humans into different background

Step 1 could be done without too much headache if you take the photo/video with white background, else you may need to use some algo to separate human from background(tiramitsu, use your hand separation net to separate human, use graph cut to manually cut out human etc).

noio · October 7, 2017, 2:02pm

So, first update in a long time!

Currently, the training procedure (including data/labeling) I consider solved. I have trained simple network built on top of tiny-YOLO that gets the accuracy I need. I can come back to this later. I’ve exported the network from PyTorch to Caffe’s format and (after some pains) managed to import it into tiny-dnn.

The problem right now is achieving real-time performance in an easy-to-deploy executable. I would like for it to run on different hardware (including my macbook) with minimal installation.

@tham I am impressed by the 70ms you achieve CPU only, because when I naively load my network (8 conv blocks) into tiny-dnn with NNPack, it takes up to 6000ms per frame. Do you use OpenCV’s dnn-modern (which is a wrapper around tiny-dnn ? What else did you do to optimise?

EDIT: Hahahah. I simply forgot to set the right backend in tiny-dnn using layer>set_backend_type(backend_t::nnpack);. Went from 6000ms to 350ms, but would like to optimise further.

tham · October 8, 2017, 5:32am

I do not use dnn-modern, only opencv dnn module.

Nothing(I compile opencv with intelTBB). I import the model and forward it as the example shown(you can study my source codes). I believe opencv dnn module do aggressive optimization on cpu part, after all it is a project back by intel .

If you want to save some pain, you can use torch or caffe to train your 8 layers network, I described squeezeNet1.1 by torch and import it by opencv dnn without any pain(almost).

I do not recommend tensorflow if you plan to export train model to opencv dnn, because

opencv dnn do not have good support on tensorflow yet, there are many surprise
tensorflow is an over complicated library, better stay away from it unless you cannot find better candidate or your boss ask you to do that

Although my experiences of lua is close to zero, I find out torch still much more easier to learn and use compare with tensorflow, it is a well designed library + you can import trained model by opencv dnn with zero pain(almost). Right now opencv do not support nngraph by torch, I guess they are busy on fixing the issues related to tensorflow(as Jeremy said, tensorflow is too complicated)

Following are the things you need to take care when saving torch model

--clear the state first, else model size will be ultra big and trained batchnorm
--may not be used correctly
net:clearState()
--convert model from gpu to cpu, else save model cannot be loaded by opencv dnn
net = net:float()
torch.save('squeeze_homo', net)

After that, just load it by opencv dnn.

dnn::Net net = dnn::readNetFromTorch(model);

Very simple, isn’t it?If you need yolo, maybe this post(#9705, port yolo v2 to opencv) can help you(unless I am quite interesting).

ps : You can change your network architectures from vgg like architectures to squeezeNet or mobileNet like.

noio · October 9, 2017, 9:51am

Thanks for the info!

I’m having some trouble finding any documentation on OpenCV-DNN, for example: does it support Depth-Separable Convolutions (that mobilenet needs?)

EDIT: Found docs! It’s all still brand new https://github.com/opencv/opencv/wiki/Deep-Learning-in-OpenCV

EDIT2: Changing the version of OpenCV that’s inside openFrameworks (to 3.3.0) is also terrible >.<. But the default included OpenCV 3.1 doesn’t include the DNN module yet.

(P.S. I’m pretty happy with my PyTorch workflow, so I would prefer to stick with that. But OpenCV-DNN can import Caffe so that seems OK!)

aviwolfson25 · November 15, 2017, 4:53pm

Hi. I am trying to do something similar and i like the simplicity in your approch.

Can u share your code with me?

Wolfsonavi@gmail.com

noio · November 15, 2017, 5:18pm

@aviwolfson25

jameswilliamson · August 16, 2024, 4:58am

Training a network to detect hands involves a series of crucial steps. Initially, a dataset of hand images is gathered and labeled to provide examples of what the network should recognize. Using techniques such as convolutional neural networks (CNNs), the model learns to identify hand features and patterns. During training, the network adjusts its weights based on the errors it makes, gradually improving its accuracy. Techniques like data augmentation can enhance the model’s robustness by creating variations of the hand images. Evaluating the model’s performance on a separate validation dataset helps ensure its effectiveness in real-world scenarios. For students tackling similar projects, seeking University Assignment help can provide valuable guidance and support throughout this complex process.