RGB vs HSV colorspace

So I’ve been wondering for a while why imagenet networks like VGG use RGB as their colourspace.

I thought that for image recognition tasks colour spaces like HSV are better suited because these encodings separate the lightness from the colour value so that pictures taken in different lighting conditions still have the same color but different lightness. [1]

Is this something that is just learnt by the network while training? Or have I just not come across networks that use HSV, or are their reasons not to use HSV to begin with?

[1] https://en.wikipedia.org/wiki/HSL_and_HSV



I had the same question, but after doing some research it dawned on me that HSV<->RGB conversion can be easily done using the kind of linear transformations that happen in neural networks. If HSV would be useful, I’d imagine we’d see nodes in the network that are sensitive to H, S, or V.

I suspect that YCbCr might be a better alternate color space to feed the network. I’m pondering to do that experiment myself for medical images; for example, digital pathology images are huge but don’t have a lot of color information (I implemented a compression algorithm in a former life that gets most gains by subsampling the chroma channels). It’s not unlikely that a network that accepts full-res images for Y and 1/4 resolution for CbCr might give pretty good results.


That sounds incredibly interesting, at a short glance I can see why reduced resolution YCbCr might work better, maybe even in conjunction with an attentional model? Should be quite a work to train new convolutional layers though…
:slight_smile: Do you mean to do spatial subsampling? or reduce the resolution of CbCr channels (i.e. make them 16bit or something?)

Still, the original question remains, if it is easily done in the network, it is still more easily (and accurately) done as preprocessing and would make the input more lighting independent to begin with.

Experimental results have shown RGB works the best of all the standard color models.

1 Like

Hmmm… @jeremy, I would have expected the answer “it depends…”

A lot of deep learning has been done on photos of natural scenes, so I am not surprised to learn that RGB is doing best. However, I’d imagine that there are several application areas where other color spaces might work too.

And of course, there are applications that do not need color - medical CT/MRI images do not have color, so one color channel should suffice.

As an aside: a long, long time ago I developed a contrast enhancement technique that was published, but only used after I published source code in a Graphics Gems book. To my surprise, that method (CLAHE) is now frequently cited and used for a myriad applications, including deep learning (as a prepocessing/normalization step). Sometimes it helps, sometime it hurts. “It depends” :slight_smile:


Excellent point - I was referring to work done on imagenet.

@jeremy Do you happen to have a citation for the work you are referring to? I’d like to better understand the testing/evaluation that was done to come to this conclusion.

1 Like

I am looking for this kind of research (ImageNet trained in different color spaces).

Could you point me to papers with this? Or give me an idea where I can search for this?

I have been searching in Google Scholar using “imagenet” and “yCbCr’ and/or"color spaces” but I can’t find these experimental results.

Turns out that combining multiple (7) different color spaces together as preprocessing input, and feeding them through a network, achieves much higher accuracy and requires less parameters.

So yeah, a CNN can learn to convert images into different useful color spaces, but it doesn’t prove that it really does so. And helping it by providing them as input and let it go from there, does help. With some classes benefiting from different color spaces, so not one that is best for all.

" ColorNet: Investigating the importance of color spaces for image classification"


Wow, it’s Mr CLAHE! It is one of the methods I have in my toolkit and have tried it as preprocessing in two Kaggle challenges (unsuccessfully, but beside the point). Thank you.

On non-photographic (non-ImageNet’y) tasks, it is often worth trying various colorspace and channel mixes to see what works best. Particularly if you have 4 or more channels of data (eg NIR), and working out how to combine those into 3.

Even on photographic images, other colorspaces often have a knack. e.g. Some say YCbCr is a better starting point for shadow detection. Color space changes are not the first thing I’d try, but they are an arrow in the quiver.

1 Like