Lesson 3: Couldn't understand data.resize() for multi label classification for planet data set?


(Mahesh Bhosale) #1

When we were doing multilabel classification om planet dataset, we decreased the size of the data from 256 to 64. After augmenting data we have done,

data = data.resize(int(sz*1.3), ‘tmp’)

Jeremy says it’s for computational advantage resulting into speed-up. But don’t really understand, how exactly we would speed up? Do we ignore all the images with size greater than or equal to sz*1.3 and hence speed up, but that would be lots of images we would be not looking onto and will lead to limited learning.


(Mahesh Bhosale) #2

Can anybody please help me with this? I am running out of ideas now.


(Michael Slater) #3

the data.resize() is mentioned in Lesson 2, and I also don’t understand what it’s doing.

Haven’t been able to understand what it’s doing either.


(Alessa Bandrabur) #4

Is resizing the images and copy their resized version in ‘tmp’ folder. I don’t see why he multiplies the size by 1.3.
Maybe the speed up is due to the fact that he saves the images in tmp folder.

def resize_imgs(fnames, targ, path, new_path):
    if not os.path.exists(os.path.join(path,new_path,str(targ),fnames[0])):
        with ThreadPoolExecutor(8) as e:
            ims = e.map(lambda x: resize_img(x, targ, path, 'tmp'), fnames)
            for x in tqdm(ims, total=len(fnames), leave=False): pass
    return os.path.join(path,new_path,str(targ))

(Alex Lee) #5

Reading notes by other classmates plus my own understanding, here is what I think.

When we specify what transforms to apply, we pass a size sz to tfms_from_model:

tfms = tfms_from_model(f_model, sz, aug_tfms=transforms_top_down, 
                                        max_zoom=1.05)

After the assignment, tfms will pass to the data loader ImageClassifierData.from_csv(...). One of the things the data loader does is to resize the images on the fly. In other words, the images go into the data loader in full size, and it ONLY resizes the image when it outputs them to the model via Python’s generator.

Notice calling data.resize(int(sz * 1.3), 'tmp') allow us to resize the images in advanced. Here, data.resize() takes a maximum size (e.g. sz=64, so the maximum size is int(sz * 1.3) = 83), so each image has dimension bigger than sz * 1.3 will be resized. Since images can be rectangular, data.resize() will make sure the smallest dimension to be sz * 1.3 and center-crop it.

Importantly, it creates a resized copy of the entire dataset to a temporary folder tmp.

This technique will save you a lot of time because according to Jeremy, if you have large images, say 1000 by 1000, reading the images and resizing it to 64 by 64 before doing SGD during training will take more time than training the convolution nets alone! By calling data.resize() in advanced, we can resize the images once and do SGD multiple times efficiently.


(Mahesh Bhosale) #6

I was talking about data.resize(), this is resize_imgs. Does this function called eventually?


(Mahesh Bhosale) #7

Thanks. I understand this better now. Does it resize to sz or sz*1.3?


#8

These files are saved in the ./tmp/ folder. You can look at there and see what they look like and what the new dimensions are.


(Alex Lee) #9

AFAIK, if the dimensions of the image are both smaller than sz * 1.3, the image size remains the same. Otherwise, if one of the dimensions is larger than sz * 1.3, the smallest dimension will become sz * 1.3.

For example, if we have an 100 by 200 image and sz=64, using the 1.3 threshold, we want the maximum image dimension to be sz * 1.3 = 83. Now since the smallest dimension of the image is 100, we resize the dimension 100 to 83. Next, we center-crop the resized 83 by 200 image, so we’ll end up with a 83 by 83 image.


(Alessa Bandrabur) #10

Thanks for the answer.
Why is multiplied by 1.3? why is not simply sz 64?


(Alessa Bandrabur) #11

data.resize calls at the very end resize_imgs


(Alex Lee) #12

I’m not sure but I kind of agree with @giusvit that “As for the coefficient, I think it’s try and test to find a balance between quality and speed.” where I think “coefficient” = 1.3.


(Jinchi Li) #13

I believe you should have slightly bigger size image so that data augmentation could work out and generate variation in the actual output images


(James_Ying) #14

@alessa

The 1.3 came from the randomRotae(10), randomflip(1.05), max_zoom(1.05).

assume the image size is 11.
When we rotate, and crop 1
1 size image. We need crop from a bigger size image.
How big is the size?
Snipaste_2018-07-03_09-56-21

then we have to do 5% for both flip and zoom.
so the final size will be √2 x cos(35) x 1.05 x 1.05=1.277