Part 2 Lesson 10 wiki

tranqy · August 29, 2018, 11:48am

I had this happen yesterday in the middle of training. Hours in, hours left.

I opened a term and ran watch nvidia-smi . Once the term showed 0% I ran the next line in jupyter to save the model files, but the line above it never registered as done in jupyter.

dazhutizi · September 23, 2018, 4:03pm

learner.fit(3e-3, 4, wds=1e-6, cycle_len=1, cycle_mult=2) run 15 epochs, what’t the mean of cycle_mult means? I understand what use_clr=() means

ali_k · September 24, 2018, 10:12am

can anyone please explain. are we throwing all vocab learnt with wikitext103 LM which are not in IMDB vocab list (itos), since originally wikitext has 238462 vocab ? if yes , whats the advantage of doing so ? isn’t it better to have 238462+60000 vocab instead of just 60000.

dazhutizi · September 24, 2018, 1:33pm

I am confused that it seems do nothing here, why should we do lr_find if we don’t change learning rate after the step?

cryax · September 25, 2018, 10:30am

Have anyone tried using pretrained lm for pos tagging task?

liorw · September 26, 2018, 2:57pm

When fine tuning the LM it is said that - “We first tune the last embedding layer so that the missing tokens initialized with mean weights get tuned properly. So we freeze everything except the last layer.”
In the code this is done with the following line of code:
learner.freeze_to(-1)

According to my understanding learner.freeze_to(-1) means unfreezing the top most layer which is not the embeddings layer. The embeddings layer is the bottom most or the first layer so I would expect to see learner.freeze_to(0).

Appreciate if you you can clarify this?

fkilaiwi · October 15, 2018, 7:12am

One more question about this, those dropouts that are showing in the summary (the ones where it says LockedDropout), why don’t they have any params? are they used? I am not sure why they are showing up.

Cesare.montresor · October 16, 2018, 9:18pm

I’m trying to run the traning on Colab but the learner.fit looks like it’s going to take a while:

HBox(children=(IntProgress(value=0, description='Epoch', max=1, style=ProgressStyle(description_width='initial…  
  0%|          | 17/3742 [12:30<46:04:38, 44.53s/it, loss=5.53]

it is just me with colab or is actually a very long training?

chrisv · October 19, 2018, 5:12am

In lesson 10 imdb.pynb, how can I get the wikitext language model to predict the next word for a sequence of tokens I feed to it? The reason I ask is that I’m interested in seeing which words it predicts. I’d like to do this before the language model is fine-tuned on the imdb data.

Can someone show me the code I’d have to write to feed a sequence like “Hello how are” to the language model and to see what its top 10 predictions for the next token are?

cahya · October 19, 2018, 7:37am

Hi, I have a script to predict a sentence using ulmfit:

github.com

cahya-wirawan/language-modeling/blob/master/indonesia/ulmfit_test.py

from fastai.text import *

import numpy as np
from utils import beamsearch

BOS = 'xbos'  # beginning-of-sentence tag
FLD = 'xfld'  # data field tag

LANG = 'id'
LM_PATH = Path(f'lmdata/{LANG}/')
LM_PATH_MODEL = LM_PATH/'models/wiki_id_lm.h5'
LM_PATH_ITOS = LM_PATH/'wiki_id_itos.pkl'

# Loading the index-word mapping to to help us convert the indexes to word datasets, if need be.
itos = pickle.load(open(LM_PATH_ITOS, 'rb'))

# creating a index-key dictionary for our vocabulary
stoi = collections.defaultdict(lambda:0, {v:k for k,v in enumerate(itos)})

# checking vocabulary size

This file has been truncated. show original

You just need to change the path to the model name accordingly.

I tried also to use an experimental beam search for the prediction, if someone interested.

aaodds71 · October 22, 2018, 2:20pm

Please help…
I tried to run imdb notebook on latest fastai version but when I want to run learner.fit(lrs/2, 1, wds=wd, use_clr=(32,2), cycle_len=1) I get an error. But in mooc version which uses previous versions of fastai and pytorch it runs fine. There is a mismatch between weights shapes. I tried to debug it and find out what happens to the weights, but so far no luck. self._flat_weights contains a list of weights with different shape of [4600,1150] or [4600] but it does not contain [5290000, 1]. maybe somewhere it gets flatten. I don’t know what really happens, so please help me.

RuntimeError Traceback (most recent call last)
in
----> 1 learner.lr_find(start_lr=lrs/10, end_lr=lrs*10, linear=True)

~/Desktop/fastai-master/courses/dl2/fastai/learner.py in lr_find(self, start_lr, end_lr, wds, linear, **kwargs)
343 layer_opt = self.get_layer_opt(start_lr, wds)
344 self.sched = LR_Finder(layer_opt, len(self.data.trn_dl), end_lr, linear=linear)
–> 345 self.fit_gen(self.model, self.data, layer_opt, 1, **kwargs)
346 self.load(‘tmp’)
347

~/Desktop/fastai-master/courses/dl2/fastai/learner.py in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, best_save_name, use_clr, use_clr_beta, metrics, callbacks, use_wd_sched, norm_wds, wds_sched_mult, use_swa, swa_start, swa_eval_freq, **kwargs)
247 metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, fp16=self.fp16,
248 swa_model=self.swa_model if use_swa else None, swa_start=swa_start,
–> 249 swa_eval_freq=swa_eval_freq, **kwargs)
250
251 def get_layer_groups(self): return self.models.get_layer_groups()

~/Desktop/fastai-master/courses/dl2/fastai/model.py in fit(model, data, n_epochs, opt, crit, metrics, callbacks, stepper, swa_model, swa_start, swa_eval_freq, visualize, kwargs)
139 batch_num += 1
140 for cb in callbacks: cb.on_batch_begin()
–> 141 loss = model_stepper.step(V(x),V(y), epoch)
142 avg_loss = avg_loss * avg_mom + loss * (1-avg_mom)
143 debias_loss = avg_loss / (1 - avg_mombatch_num)

~/Desktop/fastai-master/courses/dl2/fastai/model.py in step(self, xs, y, epoch)
48 def step(self, xs, y, epoch):
49 xtra = []
—> 50 output = self.m(*xs)
51 if isinstance(output,tuple): output,*xtra = output
52 if self.fp16: self.m.zero_grad()

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
–> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/container.py in forward(self, input)
90 def forward(self, input):
91 for module in self._modules.values():
—> 92 input = module(input)
93 return input
94

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
–> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)

~/Desktop/fastai-master/courses/dl2/fastai/lm_rnn.py in forward(self, input)
104 with warnings.catch_warnings():
105 warnings.simplefilter(“ignore”)
–> 106 raw_output, new_h = rnn(raw_output, self.hidden[l])
107 new_hidden.append(new_h)
108 raw_outputs.append(raw_output)

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/module.py in call(self, *input, **kwargs)
475 result = self._slow_forward(*input, **kwargs)
476 else:
–> 477 result = self.forward(*input, **kwargs)
478 for hook in self._forward_hooks.values():
479 hook_result = hook(self, input, result)

~/Desktop/fastai-master/courses/dl2/fastai/rnn_reg.py in forward(self, *args)
122 “”"
123 self._setweights()
–> 124 return self.module.forward(*args)
125
126 class EmbeddingDropout(nn.Module):

~/.conda/envs/myroot36/lib/python3.6/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
177 if batch_sizes is None:
178 result = _impl(input, hx, self._flat_weights, self.bias, self.num_layers,
–> 179 self.dropout, self.training, self.bidirectional, self.batch_first)
180 else:
181 result = _impl(input, batch_sizes, hx, self._flat_weights, self.bias,

RuntimeError: shape ‘[5290000, 1]’ is invalid for input of size 4600

chrisv · October 22, 2018, 4:16pm

Thank you very much cahya - I really appreciate it!

mlee10 · October 24, 2018, 10:52am

can some one please tell me why training does not continue ?

as you can see from the photos, it does not increase from 2%…

thank you for the amazing lecture by the way… !!

dharamdk · October 24, 2018, 11:55am

Can someone please confirm which fastai and torch version to use in order to follow this tutorial and run the code ?
Or else, it would be better if there is an updated code in line with the latest fastai releases i.e. 1.0.12, 1.0.11 or so.

chrisv · October 29, 2018, 5:26pm

The jupyter notebooks for the Deep Learning courses 1 and 2 only work with fastai version 0.7. Follow the installation instructions here:

pradla · October 30, 2018, 8:50am

I’m observing the same thing with my training… My classifier is overfitting in the exact same manner as yours! converging to around 94.7% accuracy in epoch 3/4 and then overfitting upto a training loss of 0.06 by the 14th

only thing I changed from Jeremy’s solution was to use a batch size of 24 instead of 48.

ShuvenduBikash · October 31, 2018, 10:47pm

Having error with Path()

DATA_PATH=Path(‘data/’)
DATA_PATH.mkdir(exist_ok=True)

NameError: name ‘Path’ is not defined

cahya · November 1, 2018, 9:30am

I think you need to import pathlib to be able to use Path

pasumarthi · November 24, 2018, 11:59am

NameError Traceback (most recent call last)
in ()
2 import spacy
3 nlp = spacy.load(‘en’)
----> 4 tok_trn, trn_labels = get_all(df_trn, 1)
5 tok_val, val_labels = get_all(df_val, 1)

in get_all(df, n_lbls)
3 for i, r in enumerate(df):
4 print(i)
----> 5 tok_, labels_ = get_texts(r, n_lbls)
6 tok += tok_;
7 labels += labels_
Getting following error get_all
in get_texts(df, n_lbls)
5 texts = list(texts.apply(fixup).values)
6
----> 7 tok = Tokenizer().proc_all_mp(partition_by_cores(texts))
8 return tok, list(labels)

NameError: name ‘Tokenizer’ is not defined

balnazzar · December 3, 2018, 2:05pm

I’m encountering the same error, with the same numbers (that is 5290000 and 4600) while attempting to train the language model with a very different dataset.

I think he was already running the notebook with 0.7…