Precompute vs. Freezing

@sermakarevich but before that @jeremy trains with precompute=True with data augmentation. Why would that even work?

Yeap. When precompute = True, with data augmentation, it doesn’t really do anything with data augmentation.

After that, when we set precompute = False, it will train the last layer.

And then, when run learn.unfreeze(), it will unfreeze all the layers, and retrain from scratch.


Anybody observed this issue when call learn.precompute = False and after training model with learn.precompute = True first?

TypeError: __call__() takes 2 positional arguments but 3 were given

_RemoteTraceback Traceback (most recent call last)
Traceback (most recent call last):
File “/home/ec2-user/.pyenv/versions/3.6.0/lib/python3.6/concurrent/futures/”, line 175, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File “/home/ec2-user/.pyenv/versions/3.6.0/lib/python3.6/concurrent/futures/”, line 153, in _process_chunk
return [fn(*args) for args in chunk]
File “/home/ec2-user/.pyenv/versions/3.6.0/lib/python3.6/concurrent/futures/”, line 153, in
return [fn(*args) for args in chunk]
File “/home/ec2-user/fastai/courses/dl1/fastai/”, line 73, in get_batch
def get_batch(self, indices): return self.collate_fn([self.dataset[i] for i in indices])
File “/home/ec2-user/fastai/courses/dl1/fastai/”, line 73, in
def get_batch(self, indices): return self.collate_fn([self.dataset[i] for i in indices])
File “/home/ec2-user/fastai/courses/dl1/fastai/”, line 99, in getitem
return self.get(self.transform, x, y)
File “/home/ec2-user/fastai/courses/dl1/fastai/”, line 104, in get
return (x,y) if tfm is None else tfm(x,y)
File “/home/ec2-user/fastai/courses/dl1/fastai/”, line 489, in call
def call(self, im, y): return compose(im, y, self.tfms)
File “/home/ec2-user/fastai/courses/dl1/fastai/”, line 470, in compose
im, y =fn(im, y)
TypeError: call() takes 2 positional arguments but 3 were given

The above exception was the direct cause of the following exception:

TypeError Traceback (most recent call last)
in ()
----> 1 lrf=learn.lr_find()
2 learn.sched.plot()

~/fastai/courses/dl1/fastai/ in lr_find(self, start_lr, end_lr, wds)
97 layer_opt = self.get_layer_opt(start_lr, wds)
98 self.sched = LR_Finder(layer_opt, len(, end_lr)
—> 99 self.fit_gen(self.model,, layer_opt, 1)
100 self.load(‘tmp’)

~/fastai/courses/dl1/fastai/ in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, metrics, callbacks, **kwargs)
81 n_epoch = sum_geom(cycle_len if cycle_len else 1, cycle_mult, n_cycle)
82 fit(model, data, n_epoch, layer_opt.opt, self.crit,
—> 83 metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, **kwargs)
85 def get_layer_groups(self): return self.models.get_layer_groups()

~/fastai/courses/dl1/fastai/ in fit(model, data, epochs, opt, crit, metrics, callbacks, **kwargs)
86 stepper.reset(True)
87 t = tqdm(iter(data.trn_dl), leave=False)
—> 88 for (*x,y) in t:
89 batch_num += 1
90 loss = stepper.step(V(x),V(y))

~/.pyenv/versions/jupyter/lib/python3.6/site-packages/tqdm/ in iter(self)
951 “”", fp_write=getattr(self.fp, ‘write’, sys.stderr.write))
–> 953 for obj in iterable:
954 yield obj
955 # Update and possibly print the progressbar.

~/fastai/courses/dl1/fastai/ in next(self)
226 if self.i>=len(self.dl): raise StopIteration
227 self.i+=1
–> 228 return next(
230 @property

~/fastai/courses/dl1/fastai/ in iter(self)
75 def iter(self):
76 with ProcessPoolExecutor(max_workers=self.num_workers) as e:
—> 77 for batch in, iter(self.batch_sampler)):
78 yield get_tensor(batch, self.pin_memory)

~/.pyenv/versions/3.6.0/lib/python3.6/concurrent/futures/ in result_iterator()
554 for future in fs:
555 if timeout is None:
–> 556 yield future.result()
557 else:
558 yield future.result(end_time - time.time())

~/.pyenv/versions/3.6.0/lib/python3.6/concurrent/futures/ in result(self, timeout)
403 raise CancelledError()
404 elif self._state == FINISHED:
–> 405 return self.__get_result()
406 else:
407 raise TimeoutError()

~/.pyenv/versions/3.6.0/lib/python3.6/concurrent/futures/ in __get_result(self)
355 def __get_result(self):
356 if self._exception:
–> 357 raise self._exception
358 else:
359 return self._result

TypeError: call() takes 2 positional arguments but 3 were given

2 I answered #3 first as I was also finding difficult to grasp it until today. Having precompute=True will allow activations to be computed before hand. This has nothing to do with freeze and unfreeze. These statements are about which part of your network that you want to be updated.

I assume that we can think of precompute argument to be about forward pass where as freeze is about backward.

3. You can have both precompute=False and learn.freeze() and it’d be totally cool.

Having precompute=False is equivalent to not computing the activation(results from your forward passes) from your original training data but rather compute it as you do forward passes.

Also if you have data augmentation allowed in this framework, this augmented sample of an observation will be transformed on the fly(with out saving it to memory) and then be used to calculate the activations. This was the precompute=False part. So your activations may change.

I would agree to the part if you were saying having precompute=False with no data augmentation is unnecessary and slow. I think yes it is, and it’s efficient to allow precompute=True if there is no data augmentation.

Whereas, unfreeze is a whole different thing, let’s say you’ve made your forward passes as described in the sentence above with augmentation allowed data. Then what happens next with freeze = True is that you back-propagate and update the weights which were only introduced as new layers to the model (these layers are in FC-Fully Connected layer and there are 2 of those layers if I am not mistaken). For more info on how models are sliced and changed you can refer to

Hope this helps, I would love to clarify more if needed.



Nope. Able to run fine.

1 Like

How can we have precompute=True and use data augmentation at the same time?

In the video recording of the lecture at 49:41, Jeremy mentions “Since we’ve already pre-computed the activations for our input images, that means that data augmentation doesn’t work”

I think one fundamental difference between a pre-trained network and a pre-computed outputs, is that:

  • In a pre-trained network the layer weights that have been pre-calculated, but this has nothing to do with your input data with the training/test/validation images of dogs and cats
  • Pre-computed activations have passed your input data (dogs and cat images) through the network and have cached the results.

It seems like pre-computing is a second level caching mechanism to avoid repeated feedforward passes on the input data, since it will end up producing the same results each time. The danger is that if you make some tweaks (like data augmentation), and you forget to re-run the pre-compute phase, then your changes won’t be reflected.


Exactly right. So if you have precompute=True, it doesn’t matter whether you freeze or unfreeze - it’s using precomputed activations either way. Unfreezing only makes a difference with precompute off.


Would it be reasonable for a net to turn precompute off automatically when it’s unfrozen?

1 Like

Yes, very reasonable!

Jeremy clarifies it more in lecture 3:
‘precompute = True’ caches some of the intermediate steps which we do not need to recalculate every time. It uses cached non-augmented activations. That’s why data augmentation doesn’t work with precompute. Having precompute speeds up our work.

I gave it a shot and overwrote the unfreeze function in ConvLearner class so that it will set the precompute to False when unfreezing layers.

I did not think freeze function should set it back to True since it seemed like a caller’s preference but I can certainly do so if people feel otherwise. Please let me know if docstring is not accurate!




Can you explain more forward pass and backward pass? Thanks.

As a note, the tense confused me a bit. precomputed=true would’ve been a bit easier to reflect the activations have been already computed. Whereas precompute = True seems to suggest to first precompute the activations whether they’ve been pre-computed or not.


Yes I can see what you’re saying there…

Once I first understood, I had the same exact same thoughts as well. Seems like a tiny detail, but there’s a lot of confusion in the forum here around precompute=True. Most confusion seems to be around - that it seems like the api is asked to precompute, while instead it’s actually really being asked to use precomputed activations for images seen before. Data augmentation’s relationship with precomputed activations is so much clear with this understanding.

This is also why I prefer variables/parameters to be descriptive rather than short, especially when it’s meant to be a library for wider usage. (for example with sz => image_size, bs => batch_size, precompute => use_precomputed_activations, ps => dropout_probabilities, aug_tfms => augmentation_transforms etc.)

But well, that’s just me (and solely my opinions). And, maybe it’s not a pythonic thing to do.


The pythonic way it make sense… but it’s a great way for us to write tutorials/blog posts for others who may get stuck on the wording. I now get it too.

Hi! Could you help me to explain why data augmentation does not work with precompute=True? For example, an image A has been passed through the network to cache all the calculations for activations. Now with augmented images A1, A2, the network should automatically know they are different from A and calculate the activation values again, shouldn’t it?

1 Like


In [7]: learn.precompute=False
In [8]:, 1, cycle_len=1)

I think In [7] is not necessary here? it should be after In [8] .e.g:

In [7]: # learn.precompute=False
In [8]:, 1, cycle_len=1)
In [9]: learn.precompute=False

UPDATE: After reading lession1 again; In [7] learn.precompute=False is necessary to traint the last layer using the augmented images (or learn.precompute=True ignore agumented images)