I have used the language model learner and there is a parameter named drop_mult. The default value is 1. I wanted to know whether this is actually dropout or is it different. To my knowledge , dropout is always less than 1, so any clarity would be greatly appreciated. Thanks
As far as I understand your different archs like LSTM have multiple dropout probabilities for different things. Once you set them, this drop_mult
property scales all of them. So you can change all dropout probabilities simultaneously using this, keeping their relative size
E.g. the default for LSTM is
hidden_p:float=0.2, input_p:float=0.6, embed_p:float=0.1, weight_p:float=0.5
using drop_mult=1.5
will basically set those to
hidden_p:float=0.3, input_p:float=0.9, embed_p:float=0.15, weight_p:float=0.75
Ok I get it now. Is there any documentation that I could refer to. Fastai docs do not explain this in much great detail.
Here: https://docs.fast.ai/text.learner.html#language_model_learner
It only says: drop_mult
is applied to all the dropouts weights of the config
At this point I can just say, don’t hesitate to also take a look at the code. That’s what I did when I saw your question. I have no idea about NLP Never used this api, but took me less than a minute to figure out…
When you click the [source]
link next to the documentation you get directly linked to this part of the source code:
Source code (click this)
def get_language_model(arch:Callable, vocab_sz:int, config:dict=None, drop_mult:float=1.):
"Create a language model from `arch` and its `config`, maybe `pretrained`."
meta = _model_meta[arch]
config = ifnone(config, meta['config_lm'].copy())
# SEE HERE
for k in config.keys():
if k.endswith('_p'): config[k] *= drop_mult # HERE
tie_weights,output_p,out_bias = map(config.pop, ['tie_weights', 'output_p', 'out_bias'])
init = config.pop('init') if 'init' in config else None
encoder = arch(vocab_sz, **config)
enc = encoder.encoder if tie_weights else None
decoder = LinearDecoder(vocab_sz, config[meta['hid_name']], output_p, tie_encoder=enc, bias=out_bias)
model = SequentialRNN(encoder, decoder)
return model if init is None else model.apply(init)
def language_model_learner(data:DataBunch, arch, config:dict=None, drop_mult:float=1., pretrained:bool=True,
pretrained_fnames:OptStrTuple=None, **learn_kwargs) -> 'LanguageLearner':
"Create a `Learner` with a language model from `data` and `arch`."
# SEE HERE
model = get_language_model(arch, len(data.vocab.itos), config=config, drop_mult=drop_mult) # HERE
meta = _model_meta[arch]
learn = LanguageLearner(data, model, split_func=meta['split_lm'], **learn_kwargs)
if pretrained:
if 'url' not in meta:
warn("There are no pretrained weights for that architecture yet!")
return learn
model_path = untar_data(meta['url'], data=False)
fnames = [list(model_path.glob(f'*.{ext}'))[0] for ext in ['pth', 'pkl']]
learn.load_pretrained(*fnames)
learn.freeze()
if pretrained_fnames is not None:
fnames = [learn.path/learn.model_dir/f'{fn}.{ext}' for fn,ext in zip(pretrained_fnames, ['pth', 'pkl'])]
learn.load_pretrained(*fnames)
learn.freeze()
return learn
Just by doing Ctrl+F
in the browser on the github page looking for drop_mult
you can see it’s passed to the above function where it is only used in one line.
Don’t assume the fastai code is too complicated to look inside. Usually it’s pretty simple
Oh I will make it a point to explore the underlying codes. Thanks a lot.
For those with the same doubt, here’s the piece of code that uses drop_multi:
if k.endswith('_p'): config[k] *= drop_mult