I am new to this…We have done several of the tutorials and are able to use a CNN to detect objects that we train on. What I can’t seem to figure out though is how to differentiate between 2 similar objects that are only different in color. Seems like the normal CNN models are colorblind. Is there a way to incorporate color detection into the model?
Good question. I think if you use transfer learning with enough color labelled targets then you should get color detection. However I have just trained a model where 100% of the red items are class 0. Yet the model seems to miss many of them.
My theory is that the shape is more dominant so possibly results will improve with more training data including red items. Nevertheless I am surprised it is less sensitive to color as they are so obvious to the human eye.
The normal CNN architectures „merge“ the colors in their input conv layer.
Maybe an option could be to load the different color channels as a 1D image and do some convolutions and merge them later in the Network with concatenation/addition with the other channels?
sounds good conceptually, but I have no idea how to do that
Seems like there would be way to detect colors. All of the examples are about detecting objects (shapes), but distinguishing 2 items of the same color could be valuable. (e.g. basketball player on team A or team B, red car or yellow car, black checker board piece or red checkerboard piece, etc.)
I had to think about this topic and the posted picture is incomplete/too simple, because it only shows the case for
input_layers = 3 and
out_layers = 1 which is not the case in a standard model, i.e., ResNet34.
The ResNet34 has the following input stage:
Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
This results in
input_layers = 3 and
out_layers = 64. Therefore, you have 64 output layers based on 64 filters (one filter consists of 3 layers). So there should be enough parameters for the network to learn color differences!?
(For going deeper into the conv operation see https://ai.stackexchange.com/questions/5769/in-a-cnn-does-each-new-filter-have-different-weights-for-each-input-channel-or and http://cs231n.github.io/convolutional-networks/#conv)
Do you have an example notebook where the network was not able to differentiate between two differently colored objects?
I thought of an example using blue and red cars. Similar to the setup from this publication but without a special network consisting of two CNN input stages and with a standard model, e.g., ResNet34.
I would be curious if this is really a problem which cannot be addressed by a standard architecture.
Here is a notebook that shows that color is detected…
Thank for the posting this nice notebook with a minimal test setup!
With this result it should be only a matter of model size or amount of training data to capture color differences in pictures?
I still wonder if you can minimally adapt the standard architectures to better capture color differences better (and still use the pretrained weights)…?
Thanks for sharing this. So it seems as though all models are color blind
My next step will be figuring out how to incorporate into an object detection model such that we first identify the object and location. In this case, do you think we would be looking for a red car in the picture or would we look for a car in the picture, and then have to do something different to determine if the car that was identified is red? Make sense?
It shows that it is not color blind!
I would start doing color and shape at the same time especially if they are linked in some way. For example if some objects are generally red then the color gives some clue to the object. Also it seems inefficient to have to run two separate models.
the color and object are not necessarily linked. If I want to detect red cars from yellow cars, both are cars (therefore the shapes and images may be the same), but the color is different. Does this help explain the challenge?
But if you are trying to detect red ferraris versus silver mercedes versus blue bmws then there is a correlation.
good point, and very true. If I am trying to detect people with red shirts on and distinguish them from people with yellow shirts on, then both times I am trying to identify people but then do something different if the person has a yellow shirt on. Thanks for your insights.