Developer chat

  1. pytorch caches RAM so nvidia-smi isn’t showing you the real picture. My code empties the cache so you get the actual reflection of the real available memory (to a degree due to fragmentation, so realistically it’s perhaps 80-90% of what you see). Note, the “no_cache” in the function name:
gpu_mem_get_free_no_cache()
  1. if your memory is really tied up and can’t be flushed, then you won’t be able to run the cell that requires X-amount of free memory anyway.

The only caveat there is potentially an unreleased learner, that requires gc.collect() as well due to circular references. Probably should add it to the function as well.

Ideally, you should be able to free the memory and continue with your code w/o needing to restart. We are not 100% there yet, but are moving in that direction.

If you notice any situations where memory gets leaked and can’t be reclaimed please report those as it’ll help this effort.

You will need to run:

import gc, torch
gc.collect()
torch.cuda.empty_cache()

to be able to see the true representation via nvidia-smi, since it is not aware of pytorch cached memory. But the easiest way is to use: https://github.com/stas00/ipyexperiments which will automatically profile the memory usage for you and that way you can quickly see where the leaks are if any. You won’t need to watch nvidia-smi any longer.

2 Likes

I see, I see.

Oh, interesting. Sounds like I should be playing with gc.collect() to see if that clears up my memory.

I think that’s great. I was also experimenting with batch sizes to take full advantage of available GPU RAM, but ideally, we would have the Learner or a peripheral do that dynamically, which I think you may already be working on in ipyexperiments. :+1:

Will do!

6 posts were merged into an existing topic: Machine Learning to Automate Learning

BREAKING CHANGE

language_model_learner and text_classifier_learner have both been changed to look like create_cnn. Why? There are now more language model architectures in fastai, and hopefully soon, more pretrained models. So you should now use:

learn = language_model_learner(data_lm, AWD_LSTM)

for the old behavior of language_model_learner and

learn = text_classifier_learner(data_clas, AWD_LSTM)

for the old behavior of text_classifier_learner (see the test example).

The good thing is you can type Transformer or TransformerXL instead of AWD_LSTM and it just works :slight_smile: Well almost, you have too add pretrained=False because there are no pretrained models for those yet. You can still pass drop_mult in both cases, and if you want to modify the defaults, you can pass a config dictionary with all the values of the kwargs (the default configs are in the dictionaries awd_lstm_lm_config, awd_lstm_clas_config, transformer_lm_config,transformer_clas_config, transformer_XL_config, transformer_XL_config). There is an example of how to change their values in the tests.

The bad thing is that you won’t be able to load directly your old models for classification (language models are fine). I had to add another layer to make it work across architectures. Normally, this function should allow you to load an old model in a new learner:

def load_old_to_new(learn:Learner, path_to_model:PathOrStr):
    wgts = torch.load(path_to_model)
    if 'model' in wgts: wgts = wgts['model']
    wgts0 = OrderedDict({k[2:]:v for k,v in wgts['model'].items() if k.startswith('0.')})
    wgts1 = OrderedDict({k[2:]:v for k,v in wgts['model'].items() if k.startswith('1.')})
    learn.model[0].module.load_state_dict(wgts0)
    learn.model[1].load_state_dict(wgts1)
13 Likes

What is the layer you had to add?

I wonder how breaking would be to take the flatten/dropout layers out of the classifier heads and into their own layer. This would make it easy to turn text models into siamese networks and vice versa…

running two rnn’s on the same gpu now :slight_smile:
certain not possibel before your improvements on usage of gpu-memory - thx

1 Like

The added layer is a module that takes any type of encoder and feeds it a sentence. This was previously a module that subclassed the AWD_LSTM main module so I had to change that to make it support transformers.

tabular.data and models.tabular have both an export function, respectively data.export() to save the databunch and learn.export() to save the model. Great but the file name by default is the same: export.pkl

To avoid that one export() deletes the file of the other export(), we could rename the default export files, no?

data.export(fname:str='data_export.pkl')
learn.export(fname:str='learn_export.pkl')

Gotcha. Thanks

How come we reset the rnn-model on_epoch_being while training instead of restoring hidden states captured in epoch_end ?

Thanks for the example in tests so that we can run with our own params! I was worried that was lost till I re-read the post.

For Transformer and TransformerXL, after a lot of vetting with the paper author source code, I figured out that our RNNLearner has default alpha=2. and beta=1. defaults that don’t make sense in the Transformer context (they are regularization for the AWD-LSTM. So, when setting up to train from scratch I had to set alpha=0.,beta=0. to make any progress.

If it helps others. This is the code base I was comparing too (written by one of the authors)

Yes, especially the alpha. The beta can be put to 1, and I found it seems to help a little bit, but transformers don’t like Activations Regularization at all!

1 Like

Because we shuffle all the texts, so there is a discontinuity in texts at that point.

thats what i thought. I just made a rutine to ensure continuity across batches so i could make a PR ?.

After that we could have a look at saving state before running the validation dataset and restore state after

That was fast :smiley:

1 Like

Hi. In the lesson6-rossmann.ipynb, the learn.summary() does not display well in my jupyter notebook (see screenshot below). It looks like the new line character \n is not taken into account (I’m using Windows 10 but I do not think this is the problem).

That question has been asked already, you have to type print(learn.summary()) now. The function doesn’t print it for you, it returns the raw text now (for best practices we try to avoid functions that print things).

1 Like

Do you think…overlaying heat map to plot_top_losses is good feature worth PR?
Something like this

from fastai.callbacks.hooks import *
def plot_top_losses_heatmap(k,learner, largest= True,figsize=(15,11)):
    tl_val,tl_idx = interp.top_losses(k,largest)
    print (tl_idx)
    classes = interp.data.classes
    rows = math.ceil(math.sqrt(k))
    fig,axes = plt.subplots(rows,rows,figsize=figsize)
    fig.suptitle('prediction/actual/loss/probability', weight='bold', size=14)
    for i,idx in enumerate(tl_idx):
        im,cl = interp.data.valid_ds[idx]
        cl = int(cl)
        ###
        xb,_ = data.one_item(im)
        xb = xb.cuda()
        m = learner.model.eval()
        with hook_output(m[0]) as hook_a:
            with hook_output(m[0], grad= True) as hook_g:
                preds = m(xb)
                preds[0,cl].backward()
        acts = hook_a.stored[0].cpu()
        avg_acts =acts.mean(0)
        sz = im.shape[-1]
        im.show(ax=axes.flat[i], title=
            f'{classes[interp.pred_class[idx]]}/{classes[cl]} / {interp.losses[idx]:.2f} / {interp.probs[idx][cl]:.2f}')
        axes.flat[i].imshow(avg_acts, alpha =0.6, extent= (0,sz,sz,0), interpolation='bilinear', cmap='magma')
1 Like

ItemLists’ constructor takes test:ItemList=None as an optional param. In most example usages, it’s invoked from some kind of train/valid splitting, hence test is indeed None. Is this by design? I was trying to invoke the constructor with a non-None test, then create LabelLists by invoking label_from_func which is handled by ItemLists.__getattr__. This code block however assumes test is None, which caused an error during self.process().

Am I missing something here?

Side note: the use of __getattr__, while providing succinctness, is not great for readability. Python’s dynamic nature/facilities can sometimes lead to delicate tradeoffs.

Thanks Sylvain (post from @stas) but why the choose solution has not been learn.summary() instead of print(learn.summary())?