Part 2 Lesson 10 wiki

(Sudarshan) #522

Does anyone have a link to Yann Lecun’s paper that Jeremy mentions in the lesson for setting a standard for NLP datasets?

(Sudarshan) #523

Could someone please help me understand this bit of code:

def get_texts(df, n_lbls=1):
    labels = df.iloc[:,range(n_lbls)].values.astype(np.int64)
    texts = f'\n{BOS} {FLD} 1 ' + df[n_lbls].astype(str)
    for i in range(n_lbls+1, len(df.columns)): texts += f' {FLD} {i-n_lbls} ' + df[i].astype(str)
    texts = list(texts.apply(fixup).values)

    tok = Tokenizer().proc_all_mp(partition_by_cores(texts))
    return tok, list(labels)

In particular, I’m trying to understand the concept of fields.

  1. On all calls of this function, as far as I can tell, n_labls is always one. Consequently, the for loop for i in range(n_lbls+1, len(df.columns)): texts += f' {FLD} {i-n_lbls} ' + df[i].astype(str) never gets executed as both n_lbls+1 and len(df_columns) is 2. Am I understanding that correctly?
  2. Are there any examples where there would be multiple fields? Jeremy mentions that documents have structure such as title, abstract etc which would constitute different fields. But how are they detected in this piece of code? I don’t see how there would be xfld <value> where value is not equal to 1 (which is set at the beginning of stream).


(Amitabha) #524

when i run the
dir_path data/en_data; cuda_id 0; cl 12; bs 64; backwards False; lr 0.001; sampled True; pretrain_id
Traceback (most recent call last):
File “”, line 53, in
if name == ‘main’: fire.Fire(train_lm)
File “/home/yhl/anaconda3/envs/fastai/lib/python3.6/site-packages/fire/”, line 127, in Fire
component_trace = _Fire(component, args, context, name)
File “/home/yhl/anaconda3/envs/fastai/lib/python3.6/site-packages/fire/”, line 366, in _Fire
component, remaining_args)
File “/home/yhl/anaconda3/envs/fastai/lib/python3.6/site-packages/fire/”, line 542, in _CallCallable
result = fn(*varargs, **kwargs)
File “”, line 42, in train_lm
learner,crit = get_learner(drops, 15000, sampled, md, em_sz, nh, nl, opt_fn, tprs)
File “/home/yhl/fastai/courses/dl2/imdb_scripts/”, line 85, in get_learner
m = to_gpu(get_language_model(md.n_tok, em_sz, nhid, nl, md.pad_idx, decode_train=False, dropouts=drops))
File “/home/yhl/fastai/courses/dl2/imdb_scripts/”, line 46, in get_language_model
rnn_enc = RNN_Encoder(n_tok, em_sz, n_hid=nhid, n_layers=nlayers, pad_token=pad_token,dropouti=dropouts[0], wdrop=dropouts[2], dropoute=dropouts[3], dropouth=dropouts[4])
TypeError: init() got an unexpected keyword argument ‘n_hid’
please lend me a hand
thank you

(Igor Kasianenko) #525

I found this to work on Google Colab to install spacy
!pip install spacy && python -m spacy download en
Thanks to Emil for pointing to

(Gerardo Garcia) #526

I’m trying to expand to predict a single element but every single time that I try it I received an error message

trn_lm looks like this

trn_lm[1] looks like

preds_one = learn.predict_array(np.array(trn_lm[1]))

When I run predict_array (trn_lm[1]) I get this error.
ValueError: not enough values to unpack (expected 2, got 1)

This is for the imdb example on Lesson 10.

(Zhen Zhang) #527

I am running the notebook. but the fine-tune process is extremely slow.
I am sure the GPU is visible to pytorch. But I don’t know how to force fastai.text use GPU.
I feel the learner only uses CPU.
Does anyone have similar issue?
Here is the performance: 0%| | 5/5029 [00:53<14:52:55, 10.66s/it, loss=5.54]

(Cayman) #528

has anyone managed to export a confusion matrix for the predictions by ULMFiT?
Would very much appreciate some help :wink:


@zzzz you can run nvidia-smi to check GPU usage

(Aaron Junod) #531

I had this happen yesterday in the middle of training. Hours in, hours left.

I opened a term and ran watch nvidia-smi . Once the term showed 0% I ran the next line in jupyter to save the model files, but the line above it never registered as done in jupyter.

#532, 4, wds=1e-6, cycle_len=1, cycle_mult=2) run 15 epochs, what’t the mean of cycle_mult means? I understand what use_clr=() means

(aamir khan) #533

can anyone please explain. are we throwing all vocab learnt with wikitext103 LM which are not in IMDB vocab list (itos), since originally wikitext has 238462 vocab ? if yes , whats the advantage of doing so ? isn’t it better to have 238462+60000 vocab instead of just 60000.


I am confused that it seems do nothing here, why should we do lr_find if we don’t change learning rate after the step?

(dsa cryax) #535

Have anyone tried using pretrained lm for pos tagging task?

(Lior Weintraub) #536

When fine tuning the LM it is said that - “We first tune the last embedding layer so that the missing tokens initialized with mean weights get tuned properly. So we freeze everything except the last layer.”
In the code this is done with the following line of code:

According to my understanding learner.freeze_to(-1) means unfreezing the top most layer which is not the embeddings layer. The embeddings layer is the bottom most or the first layer so I would expect to see learner.freeze_to(0).

Appreciate if you you can clarify this?

(Faisal Ilaiwi) #538

One more question about this, those dropouts that are showing in the summary (the ones where it says LockedDropout), why don’t they have any params? are they used? I am not sure why they are showing up.

(Cesare) #539

I’m trying to run the traning on Colab but the looks like it’s going to take a while:

HBox(children=(IntProgress(value=0, description='Epoch', max=1, style=ProgressStyle(description_width='initial…  
  0%|          | 17/3742 [12:30<46:04:38, 44.53s/it, loss=5.53]

it is just me with colab or is actually a very long training?


In lesson 10 imdb.pynb, how can I get the wikitext language model to predict the next word for a sequence of tokens I feed to it? The reason I ask is that I’m interested in seeing which words it predicts. I’d like to do this before the language model is fine-tuned on the imdb data.

Can someone show me the code I’d have to write to feed a sequence like “Hello how are” to the language model and to see what its top 10 predictions for the next token are?

(Cahya) #542

Hi, I have a script to predict a sentence using ulmfit:

You just need to change the path to the model name accordingly.

I tried also to use an experimental beam search for the prediction, if someone interested.


Please help…
I tried to run imdb notebook on latest fastai version but when I want to run, 1, wds=wd, use_clr=(32,2), cycle_len=1) I get an error. But in mooc version which uses previous versions of fastai and pytorch it runs fine. There is a mismatch between weights shapes. I tried to debug it and find out what happens to the weights, but so far no luck. self._flat_weights contains a list of weights with different shape of [4600,1150] or [4600] but it does not contain [5290000, 1]. maybe somewhere it gets flatten. I don’t know what really happens, so please help me.

RuntimeError Traceback (most recent call last)
----> 1 learner.lr_find(start_lr=lrs/10, end_lr=lrs*10, linear=True)

~/Desktop/fastai-master/courses/dl2/fastai/ in lr_find(self, start_lr, end_lr, wds, linear, **kwargs)
343 layer_opt = self.get_layer_opt(start_lr, wds)
344 self.sched = LR_Finder(layer_opt, len(, end_lr, linear=linear)
–> 345 self.fit_gen(self.model,, layer_opt, 1, **kwargs)
346 self.load(‘tmp’)

~/Desktop/fastai-master/courses/dl2/fastai/ in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, best_save_name, use_clr, use_clr_beta, metrics, callbacks, use_wd_sched, norm_wds, wds_sched_mult, use_swa, swa_start, swa_eval_freq, **kwargs)
247 metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, fp16=self.fp16,
248 swa_model=self.swa_model if use_swa else None, swa_start=swa_start,
–> 249 swa_eval_freq=swa_eval_freq, **kwargs)
251 def get_layer_groups(self): return self.models.get_layer_groups()

~/Desktop/fastai-master/courses/dl2/fastai/ in fit(model, data, n_epochs, opt, crit, metrics, callbacks, stepper, swa_model, swa_start, swa_eval_freq, visualize, kwargs)
139 batch_num += 1
140 for cb in callbacks: cb.on_batch_begin()
–> 141 loss = model_stepper.step(V(x),V(y), epoch)
142 avg_loss = avg_loss * avg_mom + loss * (1-avg_mom)
143 debias_loss = avg_loss / (1 - avg_mom

~/Desktop/fastai-master/courses/dl2/fastai/ in step(self, xs, y, epoch)
48 def step(self, xs, y, epoch):
49 xtra = []
—> 50 output = self.m(*xs)
51 if isinstance(output,tuple): output,*xtra = output
52 if self.fp16: self.m.zero_grad()

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/ in call(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
–> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/ in forward(self, input)
90 def forward(self, input):
91 for module in self._modules.values():
—> 92 input = module(input)
93 return input

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/ in call(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
–> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)

~/Desktop/fastai-master/courses/dl2/fastai/ in forward(self, input)
104 with warnings.catch_warnings():
105 warnings.simplefilter(“ignore”)
–> 106 raw_output, new_h = rnn(raw_output, self.hidden[l])
107 new_hidden.append(new_h)
108 raw_outputs.append(raw_output)

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/ in call(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
–> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)

~/Desktop/fastai-master/courses/dl2/fastai/ in forward(self, *args)
122 “”"
123 self._setweights()
–> 124 return self.module.forward(*args)
126 class EmbeddingDropout(nn.Module):

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/ in forward(self, input, hx)
177 if batch_sizes is None:
178 result = _impl(input, hx, self._flat_weights, self.bias, self.num_layers,
–> 179 self.dropout,, self.bidirectional, self.batch_first)
180 else:
181 result = _impl(input, batch_sizes, hx, self._flat_weights, self.bias,

RuntimeError: shape ‘[5290000, 1]’ is invalid for input of size 4600

Error while running lesson 4 notebook

Thank you very much cahya - I really appreciate it!