Using vgg with greyscale images

I am in lesson 2 and trying to apply the finetune of the vgg model to other datasets. I have tried with the Statefarm driver competition and it works relatively well (0.92 val acc).
But when I try with a dataset that has greyscale images (https://www.kaggle.com/c/datasciencebowl) I only reach 0.34 acc. Is there any preproccesing needed with a greyscale dataset?

1 Like

Yes, a color network isn’t necessarily optimal for a BW image. What accuracy did the leaders in the competition get? Have you tried fine tuning more layers?

PS: This is a great article for this comp: http://benanne.github.io/2015/03/17/plankton.html

Hi, I am trying to apply the network to BW images also. Have you found a way to preprocess them? :slight_smile:

In one of the videos you retrained VGG on Imagenet for the sake of Batch Normalization. Could I in theory convert all photos of Imagenet to grayscale, and then create a grayscale version of VGG16BN?

1 Like

I have managed to make some progress. Look into lesson 3 on the MNIST example (and on the notebook in the git repository) on how to retrain a (simpler) network from scratch using greyscale. My problem was that the loss increased a lot, I think it was a problem with choosing the correct learning rate (going now through lesson 4 where this is explained)

Hi, I am also trying to work through the data science bowl challenge. How are you dealing with the fact that each folder of images is a single data point rather than a single image?

I think you can not directly use gray images. The input layer is something like (3, 224, 224) the 3 represents the R, G and B channels of the image and from the gray images it should be something like (1, 224,224). May be converting gray image to RGB helps not sure though. If you have open cv installed you may try:

import cv2
import cv
color_img = cv2.cvtColor(gray_img, cv.CV_GRAY2RGB)

Has anyone gone ahead and tried training VGG (or another network, but now with 1 channel input layer) on a grayscale version of ImageNet? I’m curious if this might then transfer to looking at X-Ray or CT images. I tried searching online, but I didn’t see any obvious leads (this old forum topic is at the top of the list).

1 Like

I haven’t see anyone try this. The color channel seems important for differentiating between a lot of the images in Imagenet. I tried a variation of this in that I trained a network on MNIST (which is 60k images in grayscale) and then tried transferring over the pre-trained results on a different task. In my experience it was not worth the effort, but perhaps because I had sufficient data. I think what probably makes more sense is converting your grayscale images to RGB and then using the pretrained imagenet weights as normal. The farther your images look from imagenet the less the pretrained networks weights seem to have value though. An alternative approach would be to take a parallel task that has a lot of data that looks more similar to the features of your data. For example an old Kaggle Competition on X-Rays or CT images probably has pretrained weights and networks lying around through github and their forums.

1 Like

VGG16 by default has its own input shape. Anything you want to feed it, has to match the shape. Below you can find a code snippet that converts grayscale images to coloured (28, 28) -> (28, 28, 3) which can be fed to VGG16 for transfer learning.

did some pre-processing before feeding it to a VGG, please check the link to the implementation

Hello, please can you help me and tell me how did you used vgg16 with a dataset that has gray scale images because as I know vgg16 accepts only 3 channels images.

x3d = np.repeat(np.expand_dims(x2d, axis=3), 3, axis=3)
I just repeat the channel 3 times. I know it is not optimal way, but it works for me.

Please let me know your approaches. thanks