Part 2 Lesson 10 wiki

(Aaron Junod) #531

I had this happen yesterday in the middle of training. Hours in, hours left.

I opened a term and ran watch nvidia-smi . Once the term showed 0% I ran the next line in jupyter to save the model files, but the line above it never registered as done in jupyter.

#532, 4, wds=1e-6, cycle_len=1, cycle_mult=2) run 15 epochs, what’t the mean of cycle_mult means? I understand what use_clr=() means

(aamir khan) #533

can anyone please explain. are we throwing all vocab learnt with wikitext103 LM which are not in IMDB vocab list (itos), since originally wikitext has 238462 vocab ? if yes , whats the advantage of doing so ? isn’t it better to have 238462+60000 vocab instead of just 60000.


I am confused that it seems do nothing here, why should we do lr_find if we don’t change learning rate after the step?

(dsa cryax) #535

Have anyone tried using pretrained lm for pos tagging task?

(Lior Weintraub) #536

When fine tuning the LM it is said that - “We first tune the last embedding layer so that the missing tokens initialized with mean weights get tuned properly. So we freeze everything except the last layer.”
In the code this is done with the following line of code:

According to my understanding learner.freeze_to(-1) means unfreezing the top most layer which is not the embeddings layer. The embeddings layer is the bottom most or the first layer so I would expect to see learner.freeze_to(0).

Appreciate if you you can clarify this?

(Faisal Ilaiwi) #538

One more question about this, those dropouts that are showing in the summary (the ones where it says LockedDropout), why don’t they have any params? are they used? I am not sure why they are showing up.

(Cesare) #539

I’m trying to run the traning on Colab but the looks like it’s going to take a while:

HBox(children=(IntProgress(value=0, description='Epoch', max=1, style=ProgressStyle(description_width='initial…  
  0%|          | 17/3742 [12:30<46:04:38, 44.53s/it, loss=5.53]

it is just me with colab or is actually a very long training?


In lesson 10 imdb.pynb, how can I get the wikitext language model to predict the next word for a sequence of tokens I feed to it? The reason I ask is that I’m interested in seeing which words it predicts. I’d like to do this before the language model is fine-tuned on the imdb data.

Can someone show me the code I’d have to write to feed a sequence like “Hello how are” to the language model and to see what its top 10 predictions for the next token are?

(Cahya) #542

Hi, I have a script to predict a sentence using ulmfit:

You just need to change the path to the model name accordingly.

I tried also to use an experimental beam search for the prediction, if someone interested.


Please help…
I tried to run imdb notebook on latest fastai version but when I want to run, 1, wds=wd, use_clr=(32,2), cycle_len=1) I get an error. But in mooc version which uses previous versions of fastai and pytorch it runs fine. There is a mismatch between weights shapes. I tried to debug it and find out what happens to the weights, but so far no luck. self._flat_weights contains a list of weights with different shape of [4600,1150] or [4600] but it does not contain [5290000, 1]. maybe somewhere it gets flatten. I don’t know what really happens, so please help me.

RuntimeError Traceback (most recent call last)
----> 1 learner.lr_find(start_lr=lrs/10, end_lr=lrs*10, linear=True)

~/Desktop/fastai-master/courses/dl2/fastai/ in lr_find(self, start_lr, end_lr, wds, linear, **kwargs)
343 layer_opt = self.get_layer_opt(start_lr, wds)
344 self.sched = LR_Finder(layer_opt, len(, end_lr, linear=linear)
–> 345 self.fit_gen(self.model,, layer_opt, 1, **kwargs)
346 self.load(‘tmp’)

~/Desktop/fastai-master/courses/dl2/fastai/ in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, best_save_name, use_clr, use_clr_beta, metrics, callbacks, use_wd_sched, norm_wds, wds_sched_mult, use_swa, swa_start, swa_eval_freq, **kwargs)
247 metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, fp16=self.fp16,
248 swa_model=self.swa_model if use_swa else None, swa_start=swa_start,
–> 249 swa_eval_freq=swa_eval_freq, **kwargs)
251 def get_layer_groups(self): return self.models.get_layer_groups()

~/Desktop/fastai-master/courses/dl2/fastai/ in fit(model, data, n_epochs, opt, crit, metrics, callbacks, stepper, swa_model, swa_start, swa_eval_freq, visualize, kwargs)
139 batch_num += 1
140 for cb in callbacks: cb.on_batch_begin()
–> 141 loss = model_stepper.step(V(x),V(y), epoch)
142 avg_loss = avg_loss * avg_mom + loss * (1-avg_mom)
143 debias_loss = avg_loss / (1 - avg_mom

~/Desktop/fastai-master/courses/dl2/fastai/ in step(self, xs, y, epoch)
48 def step(self, xs, y, epoch):
49 xtra = []
—> 50 output = self.m(*xs)
51 if isinstance(output,tuple): output,*xtra = output
52 if self.fp16: self.m.zero_grad()

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/ in call(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
–> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/ in forward(self, input)
90 def forward(self, input):
91 for module in self._modules.values():
—> 92 input = module(input)
93 return input

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/ in call(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
–> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)

~/Desktop/fastai-master/courses/dl2/fastai/ in forward(self, input)
104 with warnings.catch_warnings():
105 warnings.simplefilter(“ignore”)
–> 106 raw_output, new_h = rnn(raw_output, self.hidden[l])
107 new_hidden.append(new_h)
108 raw_outputs.append(raw_output)

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/ in call(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
–> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)

~/Desktop/fastai-master/courses/dl2/fastai/ in forward(self, *args)
122 “”"
123 self._setweights()
–> 124 return self.module.forward(*args)
126 class EmbeddingDropout(nn.Module):

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/ in forward(self, input, hx)
177 if batch_sizes is None:
178 result = _impl(input, hx, self._flat_weights, self.bias, self.num_layers,
–> 179 self.dropout,, self.bidirectional, self.batch_first)
180 else:
181 result = _impl(input, batch_sizes, hx, self._flat_weights, self.bias,

RuntimeError: shape ‘[5290000, 1]’ is invalid for input of size 4600

Error while running lesson 4 notebook

Thank you very much cahya - I really appreciate it!

(MYUNG JE) #545

can some one please tell me why training does not continue ?

as you can see from the photos, it does not increase from 2%…

thank you for the amazing lecture by the way… !!

(Dharam Gajera) #546

Can someone please confirm which fastai and torch version to use in order to follow this tutorial and run the code ?
Or else, it would be better if there is an updated code in line with the latest fastai releases i.e. 1.0.12, 1.0.11 or so.


The jupyter notebooks for the Deep Learning courses 1 and 2 only work with fastai version 0.7. Follow the installation instructions here:

(pradla) #548

I’m observing the same thing with my training… My classifier is overfitting in the exact same manner as yours! converging to around 94.7% accuracy in epoch 3/4 and then overfitting upto a training loss of 0.06 by the 14th

only thing I changed from Jeremy’s solution was to use a batch size of 24 instead of 48.

(Shuvendu Bikash) #549

Having error with Path()


NameError: name ‘Path’ is not defined

(Cahya) #550

I think you need to import pathlib to be able to use Path


NameError Traceback (most recent call last)
in ()
2 import spacy
3 nlp = spacy.load(‘en’)
----> 4 tok_trn, trn_labels = get_all(df_trn, 1)
5 tok_val, val_labels = get_all(df_val, 1)

in get_all(df, n_lbls)
3 for i, r in enumerate(df):
4 print(i)
----> 5 tok_, labels_ = get_texts(r, n_lbls)
6 tok += tok_;
7 labels += labels_
Getting following error get_all
in get_texts(df, n_lbls)
5 texts = list(texts.apply(fixup).values)
----> 7 tok = Tokenizer().proc_all_mp(partition_by_cores(texts))
8 return tok, list(labels)

NameError: name ‘Tokenizer’ is not defined

(Andrea de Luca) #552

I’m encountering the same error, with the same numbers (that is 5290000 and 4600) while attempting to train the language model with a very different dataset.

I think he was already running the notebook with 0.7…