Grayscale and different image resolution

Is it possible to use cnn_learner class with resnet model and use images in Greyscale and also use different resolution than 224?
If Yes what changes I need to do in call for the functions?

Sort of.

tfms = get_transforms()  # Default fastai data augmentation options
size = 28  # Will result in 28x28 square images; passing in (224, 448) will result in 224x448 images
bs = 64  # Batch size
data = (ImageList.from_folder(path_to_your_images, convert_mode='L')  # convert_mode is passed on internally to the relevant function that will handle converting the images; 'L' results in one color channel
        .split_by_rand_pct()
        .label_from_folder()
        .transform(tfms, size=size)
        .databunch(bs=bs).normalize())
learn = cnn_learner(data, models.resnet34)  # Creates a pretrained model with the appropriate outputs for your data

This would get you most of the way, but if you ran this code, you’d find out that there’s a problem trying to send a greyscale image into most pretrained models: they expect 3 color channels because they were trained on ImageNet, which is a dataset of color images. In order to have the model accept your 1 color channel images, you’d have to reconstruct the architecture such that it accepts 1 channel as input. Jeremy teaches us how to get started with that in fastai in Lesson 7. Building a state-of-the-art ResNet is a little more involved than that, but it’s a start.

6 Likes

If your goal is simply to use 1channel greyscale images that you happen to have in some resolution as input to fastai, the only thing you really need to do is specify a size=[desired size for learning, i.e. 224] in the transform and fastai will take care of the rest. So it will a)take your images, replicate the one grey channel to all 3 RGB channels by default and b) resize the images according to your specifications.

(the open_image function in fastai.vision automatically converts to RGB by default because as @amqdn states, all pretrained architectures in fastai expect 3channel RGB images (covert_mode = ‘RGB’)

2 Likes

Thank you for the answers. I just thought that possibly for some of the visual recognition tasks working only with gray scale images may bring better results. But it seems that today’s complex deep NN models like Resnet are capable to learn important details from color pictures as well. Even if color may not bring additional apparent useful information for a certain task, like an experiment I did with a recognition of tennis players.

That was my intuition as well, but for whatever reason, the models tend to find something useful across all three channels. I’d expect that giving the model less math to crunch would be better, but other researchers in the recent Kaggle Whale Identification Competition found that that wasn’t the case. Maybe it’s an area for study.

1 Like

Hey guys, sorry for reopening this post in July.

I’ve done a cnn with fastai to classify some objects and it seems to work quite well with all pictures and i am very satisfied with it.
Problem is, if i pass a black and white picture the model always give me a wrong result. Is it possible to build a model in fastai that can classify correctly both coloured and b&w pictures?

I’d adjust your training dataset for that. The issue lies in the channels. Our color images have 3, RGB, which leads to a 3x224x224 (if they are 224), and black and white are two, so 2x224x224. One option could be to either try having both, or to pick just black and white as they have the least channels.

Thank you. I want to try and train my dataset with black and white pictures. Is there a way to easily turn my training images to grayscale when i create my databunch?

Black and white images are usually 1-channel, not 2. To turn them into 3-channel, you can just duplicate the channel 3 times.
To turn your image into grayscale, you can create a function to turn a 1-channel grayscale tensor into a 3-channel one, like:

def to_3_channels(x):
    return torch.cat((x, x, x), dim=1)

You can then pass the arguments convert_mode = 'L' (which uploads the image as 1-channel grayscale), and after_open = to_3_channels to the function you use to create the databunch.

4 Likes