Lesson 7 in-class chat ✅

In order to reclaim GPU memory or after a “CUDA out of memory exception”, the following code is equivalent?


Correct and it is so in both cases only assuming there are no other vars that refer to learn.

learn.destroy is almost ready, you can try:

def destroy(self):
    "Free the Learner internals, leaving just an empty shell that consumes no memory"
    attrs = [k for k in self.__dict__.keys() if not k.startswith("__")]
    for a in attrs: delattr(self, a)

but it’s not @sgugger approved yet.

You just call:


no need for del, None, gc.collect()

I’ve updated my summary with a fifth point about learn=None; gc.collect().

About learn.destroy: it is equal to learn=None; gc.collect()?

Pretty much. It leaves a hollow learn shell, which takes close to zero memory, so even if you don’t reassign to it later, it doesn’t matter. destroy() pretty much resets it to {}, but keeps its methods, which I guess could be deleted too. most likely that’s what should be done. otherwise it’d be misleading - still having its methods intact but no internal data to work with. So it will most likely be slightly different in the final version.

1 Like

I agree, this needs to be fixed.

The in-class example of “Human Numbers” is a very helpful example for how to encode and operate on language-like vocabularies.
However, I am having difficulty envisioning how to apply fastai’s RNN structure to a continuous variable. For instance, imagine wanting to predict a store’s sales using historical performance (ala the Rossman’s problem from Lesson 6) but using an RNN instead of tabular data.

Does anyone have a worked example (ideally on Kaggle) that I could walk through to get a better understanding of RNN on a continuous variable?


1 Like

In the “superres-gan” example, I am unable to train the discriminator/generator pair using learn.fit(40,lr). The pretraining of both networks worked fine, but when I try to fit the pretrained model I get a nonspecific error. I’ve copied the full stack-trace below.

AttributeError                            Traceback (most recent call last)
<ipython-input-27-d44c81445766> in <module>
----> 1 learn.fit(40,lr)

/opt/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
    188         if defaults.extra_callbacks is not None: callbacks += defaults.extra_callbacks
    189         fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
--> 190             callbacks=self.callbacks+callbacks)
    192     def create_opt(self, lr:Floats, wd:Floats=0.)->None:

/opt/anaconda3/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
     90             cb_handler.on_epoch_begin()
     91             for xb,yb in progress_bar(data.train_dl, parent=pbar):
---> 92                 xb, yb = cb_handler.on_batch_begin(xb, yb)
     93                 loss = loss_batch(model, xb, yb, loss_func, opt, cb_handler)
     94                 if cb_handler.on_batch_end(loss): break

/opt/anaconda3/lib/python3.7/site-packages/fastai/callback.py in on_batch_begin(self, xb, yb, train)
    253         self.state_dict['train'],self.state_dict['stop_epoch'] = train,False
    254         self.state_dict['skip_step'],self.state_dict['skip_zero'] = False,False
--> 255         self('batch_begin', mets = not self.state_dict['train'])
    256         return self.state_dict['last_input'], self.state_dict['last_target']

/opt/anaconda3/lib/python3.7/site-packages/fastai/callback.py in __call__(self, cb_name, call_mets, **kwargs)
    224         if call_mets:
    225             for met in self.metrics: self._call_and_update(met, cb_name, **kwargs)
--> 226         for cb in self.callbacks: self._call_and_update(cb, cb_name, **kwargs)
    228     def set_dl(self, dl:DataLoader):

/opt/anaconda3/lib/python3.7/site-packages/fastai/callback.py in _call_and_update(self, cb, cb_name, **kwargs)
    215         "Call `cb_name` on `cb` and update the inner state."
    216         new = ifnone(getattr(cb, f'on_{cb_name}')(**self.state_dict, **kwargs), dict())
--> 217         for k,v in new.items():
    218             if k not in self.state_dict:
    219                 raise Exception(f"{k} isn't a valid key in the state of the callbacks.")

AttributeError: 'tuple' object has no attribute 'items'

This seems to be an issue with training GANs in general, I encounter the same error when attempting to fit the learner in the “wgan” example

When attempting to train the “2a” model in the “superres” example, I’m getting a CUDA out of memory error. I tried halving the batch size, from 32 down to 16, but am still getting the error.

Strangely, the error only shows up on the second epoch of training (it gets through the first epoch just fine, and prints the results), which makes me think that it is caching data between loops that it shouldn’t be. The error message says 1.49 GiB cached which seems like a lot…

Has anyone else encountered a similar issue and figured out how to fix it?

Have you tried decreasing your image size? Gans in general eat up a ton of GPU memory, and while I didn’t have issues with running the notebooks, I did have issues with using Gans in a personal project of mine that I’m currently working on (based off the superres notebook). Playing with both batch size and image resolution helped with finding that “sweet spot” for training where I don’t get OOM errors.

What is the correct order for BatchNorm and ReLU layers.

In “lesson7-resnet-mnist.ipynb” , “Basic CNN with batchnorm” has order — Conv2d - BatchNorm - ReLU
But fast.ai method ‘conv_layer’ has order as ---- Conv2d - ReLU - BatchNorm

Please help.

1 Like

I’m trying to run the GAN example with a different dataset. However I get this output:

What does this mean?

Hello! I have a little bit confusion about kernels, filters and convolutions. I understand that one black-wight digit image in MNIST data set has size 1x28x28 and 1 is because of the channel.,because we have gray scale. RGB images have 3 channels(red,green,blue).
But I can not understand in 13.34 minute of lesson 7 where we have the model when we say that we have 8 channels and we pick it because we just want to be 8!!!What we mean by this? These are 8 filters? Do they help us to predict the final outcom? And how is it combined with this 1 channel that is about the grayscale color? Is it the same process that we do with kernels? I would appreciate any help, thank you!

Hi Chistina

kernels and filters in the context of convolutional nets are exactly the same thing (they are synonyms),

These are 8 filters?

If you think in filters, channels, etc without the whole context you can become severely confused. What we are defining with a line of code like:

nn.Conv2d(in_channels = 1, out_channels = 8, kernel_size = 3, stride = 1, padding=1)

is a convolutional layer which include data (like the channels) and operations, (like instructions in how the kernel must operate in the input channels to produce the output channels for the next layer)

Do they help us to predict the final outcome?

hell yea!!!, filters and channels are the heroes in convolutional networks, without them we can’t predict anything. Indeed a convolutional layer are nothing more than a stack of N filtered images (N channels) and the set of instructions to produce them (the kernel + its stride, padding etc )

And how is it combined with this 1 channel that is about the grayscale color? Is it the same process that we do with kernels?

Short answer:
one channel in (the gray scale image), 8 channels out (feature maps).

Each of the 8 channels is a different version of the filtered image (the kernel or filter, scan the image from left to right doing a mathematical operation (called convolution) to produce a new version of the original image, a new channel). So a channel is also an image.

Remember, an image ( channel) can be represented as an array of numbers (and each number represents the pixel intensity of the given channel), The training process adjust these numbers (pixel intensities) in each of the channels to find the best ones that helps us to decrease the loss function.

Maybe you want to check some additional resources on internet that help you to build a more solid knowledge about how convolutional networks works. Believe me, having a full and clear understanding in what you are doing, will pay good dividends in the long run and additionally, everything will become suddenly even more interesting and fun when you start building new and cool things by your own on.

good luck


Thank you very much for your time and your response!All my confusion I think it was arround the term of channel, filter(kernel) and feature maps. I made a clear sum up from also other papers and different sources in internet and now I feel much better :smiley: Also your answer was very clear. Thank you :blush: Thanks also for the advice, good luck too!

Hi I attempted the superres-gan code with my own input data.
The aim was to see if I could achieve deblurring with the same method.
However the output I obtained from fitting the GAN is odd - some of the loss is not displayed (or I suspect not computed). More specifically, while I should be obtaining train_loss, valid_loss, gen_loss and disc_loss, I’m missing 2 columns of data, and it appears the time is displayed under the gen_loss column.

Could there be any reason why?

For reference, I am using the GOPRO_Large dataset and the images are split by those that are blurred and those that are sharp.

Why don’t GANs just end up creating adversarial noise? Like, something that looks like it is a correct image to the discriminator, but to us just looks like random pixels? It seems like the sample space for random noise that fools a discriminator is probably larger than something that looks like the proper human interpretation, so shouldn’t it be easier for that to occur?

Hi there,

Do you know how the weights and some ‘magic constants’ used in the FeatureLoss for the super resolution are determined?

feat_loss = FeatureLoss(vgg_m, blocks[2:5], [5,15,2])

in particular the [5,15,2] weights applied to the layers, why exactly those values?

The other one is the ‘5e3’ in the FeatureLoss definition and the squared ‘w’ ( w**2 ), which is applied to the ‘gram_matrix’ contribution to the loss:

self.feat_losses += [base_loss(gram_matrix(f_in), gram_matrix(f_out))*w**2 * 5e3
for f_in, f_out, w in zip(in_feat, out_feat, self.wgts)]

I think in this case those weights help fine tune the ‘style’ loss in this case, don’t they?

I would greatly appreciate if someone could shed some light on those value choices? do they come from a paper or have been determined empirically using a grid search?


1 Like

Does anyone know what the fix_dl means in the DataBunch? Combination of the train_dl and valid_dl?
There is no info in the doc of DataBunch

Which folder are the superresed images saved to? It’s not under the nb1 folder.