How to generalize color of images

I’m currently competing in this completition.
The 1st FR FRONTIER: Color detection in fashion images

The sponser of competition is Fast Retailing and their most famous brand is UNIQLO.

Their clothes are very simple. So usually a cloth is tagged by just one representative color.

The objective of this competition is to classify a color of cloth in each image into 24 colors. This seems to be very straightforward since image data itself is representation of RGB values though, it’s not as easy as I imagined at first glance. Here are some facts and points I’ve noticed so far.

  • Background of all images in training and test data sets are perfect white, meaning RGB value is (255, 255, 255).
  • In training data set, around 12000 images, most images contain just one cloth and capture an entire figure from exact front angle in the center of image. But some include several sets of same clothes.
  • In test data set, around 10000 images, there are some patterns that do not appear in training data set, e.g. very close-up images capturing a specific part of cloth, manekin wearing cloth, etc.
  • Different image patterns in training and test data sets makes it difficult to utilize finetuning because it easily overfits to “shape features” in training data set.
  • The evaluation criteria is mean accuracy of each class, so marginalizing minor colors is disaster.
  • Label of training data set is very unbalanced.

So I think that the point is to generalize colors well while marginalizing edges. This seems to be very challenging to me especially from CNN perspective. Do you have any idea how to tackle this problem by deep learning? My current approach is very old-fashioned, utilizing histogram of color distribution and texture features by GLCM.

Think a CNN pretrained on ImageNet would work well with fine tuning since it should already pick up on clothing.

Curious what the result would be for this though:

1 Like

Actually I’ve already tried finetuning of VGG16, but it terribly overfitted to training data set.
Detecting cloth as a label among different types of objects and specifying a color of clothes are totally different tasks, I think.

This competition is really interesting. The color labels are object-dependent. For example, a ‘navy’ t-shirt may have similar color distribution with ‘blue’ socks; a ‘gray’ hat looks similar to a ‘black’ shirt in color.

It is challenging to find a good way to train our model that it learns the nuance difference in color among different clothing(texture.)

Off the top of my head, you might consider downsampling the images to a smaller size and doing some preprocessing to blur the edges/fill the whitespace. You could also concat the output of the histogram approach as auxiliary input to the CNN towards the end of the network.

You might want to fill the white pixels with a reflection of the nearest non white pixels. I’m sure there are some cool methods in cv2 or other imaging libraries

Here are some other preprocessing ideas from a team trying to tackle this same problem (center cropping, masking skin and background, clustering pixels) https://blog.algolia.com/how-we-handled-color-identification/

I’m also challenging this competition.
At first, I applied CNN normally, but the result was 5% accuracy.

I preprocessed the images by using “cv2.calcHist” and change 2D x 3 channel data into 1D x 3 channel data, and applied 1D CNN methods in keras(Conv1D, MaxPooling1D) . This 1D CNN approach got better accuracy than just using Neural Network, but I don’t know why.

Hi! @teru, I tried 2 things to improve my model after reading the ‘facts and points’ in your post above:

  1. Add GLCM feature as an additional input to my model.

  2. Use class_weight argument when I trained my model so that wrong predictions on minor colors results higher loss than those on major colors. Thus the model can make better predictions on colors like khaki and dark orange.

The result was surprisingly good. Although validation accuracy was not improved at all, my leader board score had a huge jump

By the way, I also tried fine-tunning renet50, inceptionv3 and xception with dense & conv2d layers, only resnet50 gave decent result.

(I found that the contest rule says we are not allowed to share data/code with other competitors so I have to edit this post a bit)

@brendan, thank you very much for the refference.
About white space, I removed all of them when computing color distribution. Blurring edges might be a good idea though, output of CNN is still strongly “positional” feature. Based on my experiments, I feel I should disregard spatial relationship of pixels entirely.

@tsu-nera, great to hear from someone who is doing the same competition!
Keep in touch, tell us about your insights when you get an Aha! moment.

@SALu grad that my comments helped your solution somehow.
It’s kinda disappointing that we cannot discuss more details because of the rule.
Looking forward to your next leap.