I’ve seen a YouTube video (https://www.youtube.com/watch?v=ydh_AdWZflA) where a robot pick pieces in a bin automaticaly.
The robot is able to detect the pieces and knows how to grip them thanks to a deep Learning algorithm.
We can see on the video that the robot takes a picture of the bin and then uses a CNN (tell me if I’m wrong) to make a semantic segmentation of the image (with to categories : grippable points and non-grippable points).
Then it tries ton grip a part in the “grippable” region, this way it can learn and refine the segmentation.
My question is : what kind of network or what architecture can I use to reproduce this behavior ?
As we do not label the regions before, I guess it’s a kind of unsupervised Learning and I’d like to know how it works !
Thank you very much