Precompute vs. Freezing

hamelsmu · November 7, 2017, 7:38am

I have read some threads in this forum about the meaning of precompute vs. freezing, but am still confused.

My understanding is that setting precompute=True:

Means that your convolutional layers of your network are frozen. And because they are frozen, the output from the last convolutional layer is never going to change if given the same image twice. Therefore, to save computational time we are going to run all of the images through the convolutional layers and just save the output of the last convolutional layer for each data point. That way, next time you use this network you can feed data directly to the dense layers and can avoid passing the data through the convolutional layers.

However, I have the following questions:

I see places in the notebook for lesson 1 where there is data augmentation and precompute = True. How can we have precompute=True and use data augmentation at the same time? For example there is this block of code:

arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 3)

What happens when precompute=True and freeze = False ? This scenario doesn’t make sense to me because if precompute=True the input into the network is the output of the last conv layer, but freeze=False implies that you want to allow the whole network to change its weights end to end. What am I missing here?
If precompute=False and Freeze=True this would be be really silly right? If you are going to freeze all your layers (Except the last one), then you might as well precompute the last convolutional layer? The only reason I can think of to have this combination of parameters is that you want to use data augmentation to train the last layer, despite freezing all the layers (assuming that you have to change precompute=False in order to use data augmentation).

Thanks for your help.

sermakarevich · November 7, 2017, 7:42am

I assume this is possible because later you can set learn.precompute=False and train a model with augmentation.

hamelsmu · November 7, 2017, 7:44am

@sermakarevich but before that @jeremy trains with precompute=True with data augmentation. Why would that even work?

jakcycsl · November 7, 2017, 7:53am

Yeap. When precompute = True, with data augmentation, it doesn’t really do anything with data augmentation.

After that, when we set precompute = False, it will train the last layer.

And then, when run learn.unfreeze(), it will unfreeze all the layers, and retrain from scratch.

sermakarevich · November 7, 2017, 8:01am

Anybody observed this issue when call learn.precompute = False and learn.fit after training model with learn.precompute = True first?

TypeError: __call__() takes 2 positional arguments but 3 were given

_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
""“
Traceback (most recent call last):
File “/home/ec2-user/.pyenv/versions/3.6.0/lib/python3.6/concurrent/futures/process.py”, line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File “/home/ec2-user/.pyenv/versions/3.6.0/lib/python3.6/concurrent/futures/process.py”, line 153, in _process_chunk
return [fn(*args) for args in chunk]
File “/home/ec2-user/.pyenv/versions/3.6.0/lib/python3.6/concurrent/futures/process.py”, line 153, in
return [fn(*args) for args in chunk]
File “/home/ec2-user/fastai/courses/dl1/fastai/dataloader.py”, line 73, in get_batch
def get_batch(self, indices): return self.collate_fn([self.dataset[i] for i in indices])
File “/home/ec2-user/fastai/courses/dl1/fastai/dataloader.py”, line 73, in
def get_batch(self, indices): return self.collate_fn([self.dataset[i] for i in indices])
File “/home/ec2-user/fastai/courses/dl1/fastai/dataset.py”, line 99, in getitem
return self.get(self.transform, x, y)
File “/home/ec2-user/fastai/courses/dl1/fastai/dataset.py”, line 104, in get
return (x,y) if tfm is None else tfm(x,y)
File “/home/ec2-user/fastai/courses/dl1/fastai/transforms.py”, line 489, in call
def call(self, im, y): return compose(im, y, self.tfms)
File “/home/ec2-user/fastai/courses/dl1/fastai/transforms.py”, line 470, in compose
im, y =fn(im, y)
TypeError: call() takes 2 positional arguments but 3 were given
”""

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last)
in ()
----> 1 lrf=learn.lr_find()
2 learn.sched.plot()

~/fastai/courses/dl1/fastai/learner.py in lr_find(self, start_lr, end_lr, wds)
97 layer_opt = self.get_layer_opt(start_lr, wds)
98 self.sched = LR_Finder(layer_opt, len(self.data.trn_dl), end_lr)
—> 99 self.fit_gen(self.model, self.data, layer_opt, 1)
100 self.load(‘tmp’)
101

~/fastai/courses/dl1/fastai/learner.py in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, metrics, callbacks, **kwargs)
81 n_epoch = sum_geom(cycle_len if cycle_len else 1, cycle_mult, n_cycle)
82 fit(model, data, n_epoch, layer_opt.opt, self.crit,
—> 83 metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, **kwargs)
84
85 def get_layer_groups(self): return self.models.get_layer_groups()

~/fastai/courses/dl1/fastai/model.py in fit(model, data, epochs, opt, crit, metrics, callbacks, **kwargs)
86 stepper.reset(True)
87 t = tqdm(iter(data.trn_dl), leave=False)
—> 88 for (*x,y) in t:
89 batch_num += 1
90 loss = stepper.step(V(x),V(y))

~/.pyenv/versions/jupyter/lib/python3.6/site-packages/tqdm/_tqdm.py in iter(self)
951 “”", fp_write=getattr(self.fp, ‘write’, sys.stderr.write))
952
–> 953 for obj in iterable:
954 yield obj
955 # Update and possibly print the progressbar.

~/fastai/courses/dl1/fastai/dataset.py in next(self)
226 if self.i>=len(self.dl): raise StopIteration
227 self.i+=1
–> 228 return next(self.it)
229
230 @property

~/fastai/courses/dl1/fastai/dataloader.py in iter(self)
75 def iter(self):
76 with ProcessPoolExecutor(max_workers=self.num_workers) as e:
—> 77 for batch in e.map(self.get_batch, iter(self.batch_sampler)):
78 yield get_tensor(batch, self.pin_memory)
79

~/.pyenv/versions/3.6.0/lib/python3.6/concurrent/futures/_base.py in result_iterator()
554 for future in fs:
555 if timeout is None:
–> 556 yield future.result()
557 else:
558 yield future.result(end_time - time.time())

~/.pyenv/versions/3.6.0/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
403 raise CancelledError()
404 elif self._state == FINISHED:
–> 405 return self.__get_result()
406 else:
407 raise TimeoutError()

~/.pyenv/versions/3.6.0/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
355 def __get_result(self):
356 if self._exception:
–> 357 raise self._exception
358 else:
359 return self._result

TypeError: call() takes 2 positional arguments but 3 were given

kcturgutlu · November 7, 2017, 8:15am

2 I answered #3 first as I was also finding difficult to grasp it until today. Having precompute=True will allow activations to be computed before hand. This has nothing to do with freeze and unfreeze. These statements are about which part of your network that you want to be updated.

I assume that we can think of precompute argument to be about forward pass where as freeze is about backward.

3. You can have both precompute=False and learn.freeze() and it’d be totally cool.

Having precompute=False is equivalent to not computing the activation(results from your forward passes) from your original training data but rather compute it as you do forward passes.

Also if you have data augmentation allowed in this framework, this augmented sample of an observation will be transformed on the fly(with out saving it to memory) and then be used to calculate the activations. This was the precompute=False part. So your activations may change.

I would agree to the part if you were saying having precompute=False with no data augmentation is unnecessary and slow. I think yes it is, and it’s efficient to allow precompute=True if there is no data augmentation.

Whereas, unfreeze is a whole different thing, let’s say you’ve made your forward passes as described in the sentence above with augmentation allowed data. Then what happens next with freeze = True is that you back-propagate and update the weights which were only introduced as new layers to the model (these layers are in FC-Fully Connected layer and there are 2 of those layers if I am not mistaken). For more info on how models are sliced and changed you can refer to conv_learner.py.

Hope this helps, I would love to clarify more if needed.

Best

jakcycsl · November 7, 2017, 9:00am

Nope. Able to run fine.

tleyden · November 7, 2017, 4:20pm

How can we have precompute=True and use data augmentation at the same time?

In the video recording of the lecture at 49:41, Jeremy mentions “Since we’ve already pre-computed the activations for our input images, that means that data augmentation doesn’t work”

I think one fundamental difference between a pre-trained network and a pre-computed outputs, is that:

In a pre-trained network the layer weights that have been pre-calculated, but this has nothing to do with your input data with the training/test/validation images of dogs and cats
Pre-computed activations have passed your input data (dogs and cat images) through the network and have cached the results.

It seems like pre-computing is a second level caching mechanism to avoid repeated feedforward passes on the input data, since it will end up producing the same results each time. The danger is that if you make some tweaks (like data augmentation), and you forget to re-run the pre-compute phase, then your changes won’t be reflected.

jeremy · November 7, 2017, 4:31pm

Exactly right. So if you have precompute=True, it doesn’t matter whether you freeze or unfreeze - it’s using precomputed activations either way. Unfreezing only makes a difference with precompute off.

stathis · November 15, 2017, 2:45am

Would it be reasonable for a net to turn precompute off automatically when it’s unfrozen?

jeremy · November 15, 2017, 4:31am

Yes, very reasonable!

vikbehal · November 15, 2017, 6:40am

Jeremy clarifies it more in lecture 3:
‘precompute = True’ caches some of the intermediate steps which we do not need to recalculate every time. It uses cached non-augmented activations. That’s why data augmentation doesn’t work with precompute. Having precompute speeds up our work.

hiromi · November 15, 2017, 12:48pm

I gave it a shot and overwrote the unfreeze function in ConvLearner class so that it will set the precompute to False when unfreezing layers.

I did not think freeze function should set it back to True since it seemed like a caller’s preference but I can certainly do so if people feel otherwise. Please let me know if docstring is not accurate!

jeremy · November 15, 2017, 2:00pm

Excellent!

sabzo · November 24, 2017, 2:03am

deleted

sabzo · November 24, 2017, 2:04am

Can you explain more forward pass and backward pass? Thanks.

sabzo · November 24, 2017, 2:15am

As a note, the tense confused me a bit. precomputed=true would’ve been a bit easier to reflect the activations have been already computed. Whereas precompute = True seems to suggest to first precompute the activations whether they’ve been pre-computed or not.

jeremy · November 24, 2017, 2:42am

Yes I can see what you’re saying there…

suvash · November 26, 2017, 12:19am

Once I first understood, I had the same exact same thoughts as well. Seems like a tiny detail, but there’s a lot of confusion in the forum here around precompute=True. Most confusion seems to be around - that it seems like the api is asked to precompute, while instead it’s actually really being asked to use precomputed activations for images seen before. Data augmentation’s relationship with precomputed activations is so much clear with this understanding.

This is also why I prefer variables/parameters to be descriptive rather than short, especially when it’s meant to be a library for wider usage. (for example with sz => image_size, bs => batch_size, precompute => use_precomputed_activations, ps => dropout_probabilities, aug_tfms => augmentation_transforms etc.)

But well, that’s just me (and solely my opinions). And, maybe it’s not a pythonic thing to do.

sabzo · November 26, 2017, 4:14am

The pythonic way it make sense… but it’s a great way for us to write tutorials/blog posts for others who may get stuck on the wording. I now get it too.