ConvLearner has same train time with pretrained = True or False

I think I may be misunderstanding what the pretrained parameter does and what freezing does in general.

So it was my understanding that freezing the model would mean that only the final layer is trained (the rest of the model being frozen), but the weird thing I’m seeing is that my model takes 15 minutes to train when I have pretrained = True or if I have pretrained = False, so I am wondering if I am missing something? Shouldn’t the model be significantly faster if only the final layer is being trained versus all of the layers?

Here is the code I’m using:

PATH = Path("kaggleData/competitions/imaterialist-challenge-furniture-2018/")

ds_tfms=([crop_pad(size=([112,112]))],[crop_pad(size=([112,112]))])

data = ImageDataBunch.from_folder(PATH, train="train", valid="valid", test="test", ds_tfms=get_transforms(), size=112, bs=64)

arch = models.resnet34

learn = ConvLearner(data, arch, pretrained=False)
#learn = ConvLearner(data, arch, pretrained=True)

Am I using something wrong here or have I just completely misunderstood pretrained/freezing?

3 Likes

Hi Kevin,
you are defining ds_tfms but ain’t passing that as the Param also?
Why so?

I was initially using ds_tfms, but swapped that out for get_transforms(). I am still experimenting with transforms and when get_transforms makes sense versus not.

According to the snippet here we can pass in xtra_tfms as well?(like crop and all which are predefined for us)
Am I correct with my understanding?

1 Like

I believe you are correct. It passes it using listify which just turns whatever you turn into a list if it’s not (i believe). So if you pass the transforms you want to add as a list, it should add them.

Also shouldn’t we just add once, like I don’t know but why but you passed the tfms twice?
For train and test?

Yeah, train and test because there are some transforms that you want to happen to both, like resizing, but blurring you only want to happen on the training set, not the validation/test set

1 Like

Hi Kevin,
I have no definitive answers, but some guesses, as I have struggled with this myself.

  • First it is important to see, that pretraining and unfreezing are not the same. So you can freeze an untrained model (probably not very helpful) and unfreeze a pretrained model. But, in the code helpfully if we specify pretrained=True the body gets frozen automatically, whereas it pretrained is False the model stays unfrozen. (I state this here explicitly because I looked through all of this in the fastai code, because I thought there might be a bug here (model still frozen, even if untrained…), but that is not the case. Also whether you train an unfrozen pretrained model (meaning weights are not random) or an unfrozen randomly initialized model, should make no difference (same number of parameters to train, same number of calculations neccessary, just based on different starting values of the weights)
  • I checked the following things without using any transformations (which is important because as someone else stated, the actual processing of the images could be a major bottleneck which overshadows the actual GPU processing)
  • Closely monitor your nvidia-smi (dmon or other permanent monitoring): Depending on how powerful your card is, it might be that in the pretrained/frozen state your card was largely idling (low util), whereas unfrozen the card utilization goes up a lot.
  • As long as there seems to be enough “reserve parallel computing power”, there is no large difference in performance when unfreezing
  • The huge difference is the memory utilization. The unfrozen model takes a lot more Memory. But if it fits and there is enough compute power, this seems to hardly make a difference
  • Differences in speed are therefore largely generated by actually utilizing your card fully. That means for a frozen model, memory util is much lower, therefore you can set the batch sizes much higher. Larger batches mean that the cards parallel processing gets used better and therefore the modell runs faster.
  • So generally, if you want to speed things up (while the card is not already at 99% util), increasing the batch sizes seems the most “speed enhancing” thing to do.

If someone with more insight into this can correct me or has better answeres, that would be very appreciated, as I basically have the same question. :wink:

2 Likes

Pretraining sets whether the model uses pretrained weights, instead of random weights. So that doesn’t necessarily change the speed per epoch - just the number of epochs it takes.

Pretraining also (IIRC) automatically freezes the convolutional part of the model. But that won’t change the speed very much, since it still has to do a full forward pass.

BTW, don’t forget to use the ‘advanced’ topic to discuss stuff not yet mentioned in the course. I’ll move this now.

1 Like

aah, and now we know how “precompute=True” used to speed things up, because the entire forward pass of the conv net could be avoided when frozen?!

You should check this discussion Precompute vs. Freezing . I am sure it will answer your query.

Yes. Though it had over flows (not doing data augmentation) and confused beginners so we removed it.

Thanks to everyone for the comments. I think the main confusion I was having was precompute vs pretrained. So if freezing all of the weights at the beginning doesn’t speed things up, if I have a model I’m trying to build that is somewhat similar to image net, does it make sense to just unfreeze it initially? The other question this brings up for me is: is there a quick way to take a percentage of your training images as a sample or do you have to create a folder structure for the sample? I was thinking about making a small snippet of code that takes the train folder and only returns 1/n images into train_ds if that seems like a good thing to have and it doesn’t exist.

Also thanks for the advanced reminder. I hadn’t put it on there because this spawned from my notebook 1 homework, but I understand how the determination should be made now. Basically if it isn’t something we have went over in class, it is advanced.

Ideally you should freeze the weights and train the last layers and then unfreeze and train the model. Because initially the weights are randomly assigned and error will be high initially. Unfreezing initially should work although. This post has covered all the cases separately https://forums.fast.ai/t/lesson1-trying-to-understand-fit-one-cylcle-unfreeze-and-max-lr/27963
Not sure of the second question.
Hope this helps

2 Likes

Yes there is. See the download_images.ipynb notebook for a full example.

1 Like

I’m looking at this notebook: https://github.com/fastai/course-v3/blob/master/nbs/dl1/download_images.ipynb

Are you talking about this command:


data = ImageDataBunch.from_folder(path, train=".", valid_pct=0.2, ds_tfms=get_transforms(), size=224, num_workers=4)

I used this and changed it to valid_pct of 0.99 so I was only going to have a training set of 1% of the dataset, but it then takes a long time to run the validation part so it isn’t quite the same as just reducing the overall number of images being pulled into the DataBunch.

I think you might be correct on this. Is there a way with v1 to resize the images ahead of time? I know I was able to it in the old library and I’m wondering if that is what is slowing everything down.

I wrote a piace of code just now. It makes use of the fastai augmentation api. You can use it for generating a new dataset which has the transformation you desire.

from fastai import *
from fastai.vision import *
from fastai.imports import *

import matplotlib.image as IM 

# Example:
#
# TR = gettransfs()
#
# transfy(TR, '/data/apath', variants=5)

def gettransfs(do_flip=False, max_rotate=90, **kwargs):
    tfms = get_transforms(do_flip=do_flip, max_rotate=90, **kwargs)
    return tfms

def transfy(tfms, rootpath, variants=1, newdir='AUGM', tree_structure=True, include_source=True, **kwargs): 
    PATH=rootpath
    if not os.path.exists(PATH):
        print('Path does not exist.')
        return 0
    else: print('Working in '+PATH)
    os.chdir(PATH)
    workdir = PATH

    if not os.path.exists(f'{workdir}/../'+newdir):
        os.mkdir(f'{workdir}/../'+newdir) 
    content = os.listdir(workdir)
    subfolders = [item for item in content if os.path.isdir(item)] 
    print('Classes (subfolders) found: '), print(len(subfolders))
    if tree_structure:
        for folder in subfolders:
            if not os.path.exists(f'{workdir}/../'+newdir+'/'+folder): os.mkdir(f'{workdir}/../'+newdir+'/'+folder)
    
    for folder in subfolders:
        os.chdir(f'{workdir}/'+folder)
        currentd=os.getcwd().replace('\\','/')
        filelist=os.listdir(currentd)
        print('Visiting '+currentd)
        #verify_images(currentd) 
        #commented since it's dangerous      
        if tree_structure:
            dest=os.path.abspath((f'{workdir}/../'+newdir+'/'+folder)).replace('\\','/')
        else:
            dest=os.path.abspath((f'{workdir}/../'+newdir)).replace('\\','/')
                
        for file_idx in filelist:
            current_img = open_image(currentd+'/'+file_idx)
            if include_source:
                shutil.copyfile(currentd+'/'+file_idx, dest+'/'+file_idx)
            for i in range(variants):
                currenttsfimg=apply_tfms(tfms[0], current_img, size=299, **kwargs)
                currenttsfimgnp=image2np(currenttsfimg.data)
                IM.imsave(os.path.splitext(dest+'/'+file_idx)[0]+'_'+str(i+2)+os.path.splitext(dest+'/'+file_idx)[1],
                currenttsfimgnp)
    print('Finished')

You got to give to these two functions the proper kwargs based on your needs, but for testing purposes try what’s suggested in the comment. I’ll put it in a specific thread, maybe some other people could find it useful.


Coming to the original question:

Pretrained is not so important when it comes to speed. The only difference is that when False, the weights are randomly (in fact, I think a la Xavier) initialized, whereas when True, they are initialized with their Imagenetlike values. In both cases, the training must go on along the whole network.
With a frozen model, however, the weights need not to be updated, apart from the custom head, but the frozen part still needs to (unnecessarily, indeed, if you do no aug) compute the activations.
Back when precompute was available, the frozen part was actually dormant: since the weights were always the same as well as the data points, the nodes would spit out always the same activations. Only the head was active (both weight updating and activations’ computation).