Lesson 1 - Non-beginner discussion

VishnuSubramanian · March 23, 2020, 1:54pm

ImageBlock points out to this code fastai2/vision/core.py

    _bypass_type=Image.Image
    _show_args = {'cmap':'viridis'}
    _open_args = {'mode': 'RGB'}
    @classmethod
    def create(cls, fn:(Path,str,Tensor,ndarray,bytes), **kwargs)->None:
        "Open an `Image` from path `fn`"
        if isinstance(fn,TensorImage): fn = fn.permute(1,2,0).type(torch.uint8)
        if isinstance(fn,Tensor): fn = fn.numpy()
        if isinstance(fn,ndarray): return cls(Image.fromarray(fn))
        if isinstance(fn,bytes): fn = io.BytesIO(fn)
        return cls(load_image(fn, **merge(cls._open_args, kwargs)))

In the create function, it handles certain types like TensorImage, ndarray. In the example, you pointed the output of get_x would be passed to ImageBlock which is already an image. If you do not use PILImage.create then just file path to the image is passed to the Image block and the lask line of create function gets executed.

arora_aman · March 23, 2020, 2:09pm

Thanks, I understand it now.

muellerzr · March 23, 2020, 3:11pm

@arora_aman also FYI, you should pass into ImageBlock saying you’re doing one channel images like so: ImageBlock(cls=PILImageBW) otherwise if you want to do get_x, make sure you use BW

barnacl · March 23, 2020, 3:15pm

Mnist is a 1channel image. But has two dimension height and width. 32x32x1. A rgb image is 3channels 32x32x3.

arora_aman · March 23, 2020, 3:24pm

Thanks @muellerzr and @barnacl . Not passing PILImageBW works too because it converts my images to 3 channel (by copying one channel across the others), I find this beneficial because then I don’t need to update model for single channel.

(PILImage mode=RGB size=28x28, TensorCategory(1))

FYI, this is an example of dl.train_ds[0]

muellerzr · March 23, 2020, 3:29pm

@arora_aman is there any downsides of doing so? What’s that third channel now generated?

arora_aman · March 23, 2020, 3:45pm

It’s just a different approach I think. I don’t know much about this too tbh but when I participated in BengaliAI, Kagglers got top scores both by using single channel or multi channel images for Black and White handwritten graphemes images.

I don’t know when to use one before the other.

VishnuSubramanian · March 23, 2020, 3:47pm

I find using 3 channel intutive, as imagenet models are trained on images with 3 channels. In some of my experiments I found 3 channel to work better even for BW images.

akashpalrecha · March 23, 2020, 3:50pm

I can confirm this from my experience too.

DrHB · March 23, 2020, 3:52pm

for bengali as Aman and other mentioned after extensive experiments I did not find any difference training on 1 channel or 3 channels. The only difference was when you use 1 channel and dont copy weights from 1st conv layers the training takes a bit longer because you have to relearn weights. I made a post here: https://www.kaggle.com/c/bengaliai-cv19/discussion/130311#745589 explaining some stuff

But I think one can do proper investigation and benchmarks …

vijayabhaskar · March 23, 2020, 3:54pm

I find 3 channel approach better too. Only because they need no new changes to be made to the model and transfer learning just works fine, 1 channel or 3 channel theoretically they both have the same information.

DrHB · March 23, 2020, 4:06pm

Just to add to this! The thing which I had mix feelings is normalization. Suppose you modify first conv and the question is should you normalize using imagnet stats but converted to 1 channel ? or use your data specific normalization ?

vijayabhaskar · March 23, 2020, 4:08pm

To add to my previous comment
1 channel image is just stacked 3 times to make the 3 channel image. Other than saving memory I see no other use case for using 1 channel images for transfer learning.

VishnuSubramanian · March 23, 2020, 4:11pm

We can use expand which is memory efficient.
https://stackoverflow.com/questions/44593141/stacking-copies-of-an-array-a-torch-tensor-efficiently

DrHB · March 23, 2020, 4:12pm

Agree… I think the best way in some competition when 1 channel input was given people were finding different ways to code information in to other channels. For example in Hemorrhage Detection competition one could code different windows like brain or bone. In Google doodle drawing competition people were encoding diffrent strokes in to channels =)

vijayabhaskar · March 23, 2020, 4:34pm

I don’t think replacing a pretrained model’s first conv layer is a good technique, the second layer in the model has weights trained in such a way to understand patterns by combining the lower-level patterns the first layer was able to recognise, if you throw away that layer’s weights and replace it with random weights(1 channel version), you lose everything that you get by using a pretrained model.

DrHB · March 23, 2020, 4:38pm

agree if just replacing first conv with random weights it will take longer to train, but substituting weights from1st conv by average from 3conv result in much more stable training.

arora_aman · March 23, 2020, 4:54pm

Thanks all for your response!

barnacl · March 23, 2020, 5:01pm

I think it is mainly for visualisation in show_batch. Shouldn’t effect training in anyway. That is a good point about transfer learning (when applicable).

arora_aman · March 24, 2020, 1:51am

My labels are one hot encoded - what’s the best way to create the DataBlock API?