Developer chat


When you use load_learner it automatically puts the model on defaults.device. So it was saved on the CPU, but when you load it, it’s put on the GPU if you have one.

(Pierre Guillou) #726

Thanks for the answer Sylvain. Is there a possibility to add an argument to load_learner() to put it to CPU (as I want to use my learner object to get predictions, I do not need GPU after loading) ?


Not really no, it uses the usual Learner init which doesn’t take that argument. That’s why there is a defaults object you can change.

(Pierre Guillou) #728

Ok but it means it is necessary to write 2 lines of code instead of one to load a learner in order to get predictions on CPU (when the notebook is on cuda).

defaults.device = torch.device('cpu')
learn = load_learner(path)

Another point, in the docs and code about load_learner(), it is written:

Load a Learner object saved with export_state in path/fn with empty data, optionally add test and load on cpu.

It is true but it is a bit confusing with the fact that defaults.device = torch.device('cpu') is needed in order to get our model on CPU in practice (when the notebook is on cuda).


If you have a GPU, you gain nothing to put everything on the CPU, so I don’t get why it’s a problem. We’re only talking about doing inference on CPU because you’d do this on a machine/server that doesn’t have access to a GPU (in which case the default device will be the CPU).

(Pierre Guillou) #730

Thanks. It is clear.

(Etay) #731

Hi. I have been installing the dev version from github using pip install git+

For the latest version, it fails because the cu/cpp files in fastai/text/modesl/ were not installed.

I am guessing this has something to do with the file (it does not explicitly include *.cpp), but I have no idea how to fix it (so no PR).


(Stas Bekman) #732

That is a very vague report, @emaror. What fails?

It appears those were experimental and used only by those with pip install -e .

Now added fastai/text/models/forget_mult*{cpp,cu} to as you suggested, so everybody will have access to those.

If you still need help, please remember we don’t have an 8-ball (yet), so you need to be explicit in your help requests.

(Etay) #733

Right, I get an exception when importing QRNNLayer:

from fastai.text.models.qrnn import QRNNLayer

Traceback (most recent call last):
File “”, line 1, in
File “/home/emaror/.conda/envs/fastai/lib/python3.7/site-packages/fastai/text/models/”, line 11, in
forget_mult_cuda = load(name=‘forget_mult_cuda’, sources=[fastai_path/f for f in files])
File “/home/emaror/.conda/envs/fastai/lib/python3.7/site-packages/torch/utils/”, line 645, in load
File “/home/emaror/.conda/envs/fastai/lib/python3.7/site-packages/torch/utils/”, line 793, in _jit_compile
File “/home/emaror/.conda/envs/fastai/lib/python3.7/site-packages/torch/utils/”, line 44, in bump_version_if_changed
hash_value = hash_source_files(hash_value, source_files)
File “/home/emaror/.conda/envs/fastai/lib/python3.7/site-packages/torch/utils/”, line 16, in hash_source_files
with open(filename) as file:
FileNotFoundError: [Errno 2] No such file or directory: ‘/home/emaror/.conda/envs/fastai/lib/python3.7/site-packages/fastai/text/models/forget_mult_cuda.cpp’

Copying the cpp (and cu) files to the installation directory fixed this issue.
Hopefully, the change to fixes this.


(Stas Bekman) #734

that’s much better, thank you! yes, please try again after installing from git.


Added the transformer language model (decoder only) and the transformer XL in fastai.text.models. I’ll work on some nice functions to call them as quickly as the AWD-LSTM (this might break a little bit the existing behavior of get_language_model and text_classifier_learner to make them look more like create_cnn) next week.
I’ll also add the pretrained model from openAI (transformer architecture) once I’ve figured out how to reconcile their way of tokenizing with ours.

(Kaspar Lund) #736

@sgugger @stas today i have been able to triple the batchsize when train rnn-lstm. Do you know what has happen in the dev version the last couple of days that might make til possible.

(Stas Bekman) #737

Triple - that’s awesome!
There was a bunch of qrnn fixes that you can see in the commits, but you’d have to ask @sgugger for specifics.


I made a big clean up of the kwargs in the library to try to make arguments accepted a little bit more obvious, and added some checks when needed.
Also added: beam search for predictions in a language mode.

(Kaspar Lund) #739

This is the training and validation loss of an RNN. Notice that the majority of the divergens happens when RnnTrainer resets the model at on_epoch_begin. Wonder if we could reduce this divergens a bit by ensuring continuity of the batches across epochs and disable self.learn.model.reset() in on_epoch_begin

i’ll make a test tomorrow:)

(Stas Bekman) #740

I started parametrizing the bs, size, and other hyperparameters that impact GPU RAM needs in the lesson notebooks. For example in, lesson7-superres.ipynb, which until now you couldn’t run unless you had some 9-10GB free GPU RAM, now you can as this cell was added:

from fastai.utils.mem import *
free = gpu_mem_get_free_no_cache()
# the max size of the test image depends on the available GPU RAM 
if free > 8000: size=(1280, 1600) # >  8GB RAM
else:           size=( 820, 1024) # <= 8GB RAM
print(f"using size={size}, have {free}MB of GPU RAM free")

# and then use that dynamically set `size` in the following cells!

Note that the code doesn’t check the total size of the card’s RAM, but the actual available RAM, since you may have a card with a huge amount of RAM, but perhaps half of it is taken, so it’d be pointless continuing. So when you make such tweaks experiment to see what actual amount of free memory is needed to be able to say do fit or predict. e.g. in this notebook it needed more than 7GB to just do a single prediction on a 1280x1600 image(!).

The conditionals will need to be refined to probably support 3 different settings (i.e. min RAM required), but 2 is a good enough of a start.

I will be doing it if and when I run a lesson and I get OOM on my 8GB card, so please help with other lessons, especially if you have a smaller sized card and you have to manually adjust things at all the time.

p.s. gpu_mem_get_free_no_cache requires git master or 1.0.43 when it gets released.

(Florian Mutel) #741

I’m working on small objects image classification (few pixels) such as satellite imagery, so I wanted to use a pretrained model without (7,7) kernel size. I thought about tvm.vgg16_bn but the _default_meta is not suited and we can not pass “body_fn” argument to create_cnn() function.

Should I make a PR to add vgg meta (split at layer 22 like fastai2018?) and pass body_fn as create_cnn arg ?

By the way is there a more recent/suited arch without (7,7) feature extraction available ?

(Bobak Farzin) #742

Thanks for implementing this. I am trying to use the get_transformer_lm and build a Wiki103 model from scratch. It does not appear to be training well with fit_one_cycle which is kind of surprising to me. I am wondering if there is something about the self-attention that might not be right or if the training params are critical and one_cycle won’t work well. Any ideas/insight? I can move on to trying TransformerXL, but I figured I would get some kind of reasonable perplexity and accuracy with the self-attention model right out of the original paper

(Andrew Nguyen) #743

Hi, Stas –

I think this is great, but I’m wondering if only checking actual available RAM is the best approach. I say this because, as I’ve been going through the lessons, I found that sometimes my GPU RAM got tied up even though nothing important was happening in the notebook – somehow the Python process got stuck (and not necessarily because CUDA OOM exceptions occurred beforehand).

I found that I could use nvidia-smi, find the process ID that was taking up RAM, and then kill {process_id} at the terminal to free up the resources. Of course, this would reset my kernel, but I was trying to do that anyway in the notebook and it wasn’t working.

Just a thought. What do you think?

EDIT: I should say that as far as the application you’re describing, I think your solution is necessary for those who can’t even run cells that require more GPU RAM than they have. I’m more speaking to the circumstance where one might have the GPU RAM available if they killed processes that weren’t actually doing anything useful.


You can definitely suggest a PR to add smart _default_meta