Wiki: Lesson 6

Hi @jeremy I am aware of there was an update for nlp (update ID#ade043e). I did git pull and conda env update. However, I got the following error message by running from fastai.nlp import * which worked previously. I tried to reinstall spacy but the error persisted. Any idea on how to resolve this problem?

<ipython-input-6-6070169c89a3> in <module>()
     11 from fastai.rnn_reg import *
     12 from fastai.rnn_train import *
---> 13 from fastai.nlp import *
     14 from fastai.text import *
     15 from fastai.lm_rnn import *

~/fastai/courses/dl1/fastai/ in <module>()
      5 from .dataset import *
      6 from .learner import *
----> 7 from .text import *
      8 from .lm_rnn import *

~/fastai/courses/dl1/fastai/ in <module>()
      9 def sub_br(x): return re_br.sub("\n", x)
---> 11 my_tok = spacy.load('en')
     12 my_tok.tokenizer.add_special_case('<eos>', [{ORTH: '<eos>'}])
     13 my_tok.tokenizer.add_special_case('<bos>', [{ORTH: '<bos>'}])

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/spacy/ in load(name, **overrides)
     17             "to load. For example:\nnlp = spacy.load('{}')".format(depr_path),
     18             'error')
---> 19     return util.load_model(name, **overrides)

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/spacy/ in load_model(name, **overrides)
    110     if isinstance(name, basestring_):  # in data dir / shortcut
    111         if name in set([ for d in data_path.iterdir()]):
--> 112             return load_model_from_link(name, **overrides)
    113         if is_package(name):  # installed as package
    114             return load_model_from_package(name, **overrides)

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/spacy/ in load_model_from_link(name, **overrides)
    124     path = get_data_path() / name / ''
    125     try:
--> 126         cls = import_file(name, path)
    127     except AttributeError:
    128         raise IOError(

~/src/anaconda3/envs/fastai/lib/python3.6/site-packages/spacy/ in import_file(name, loc)
    117         spec = importlib.util.spec_from_file_location(name, str(loc))
    118         module = importlib.util.module_from_spec(spec)
--> 119         spec.loader.exec_module(module)
    120         return module

~/src/anaconda3/envs/fastai/lib/python3.6/importlib/ in exec_module(self, module)

~/src/anaconda3/envs/fastai/lib/python3.6/importlib/ in get_code(self, fullname)

~/src/anaconda3/envs/fastai/lib/python3.6/importlib/ in get_data(self, path)

FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/src/anaconda3/envs/fastai/lib/python3.6/site-packages/spacy/data/en/'```

@moody here’s how to install a spacy model:


Passing along in case the use of * python operation confuses anyone else …

it = iter(md.trn_dl)
*xs,yt = next(it) #*xs packs the arguments into xs
t = m(*V(xs)) #*V(xs) unpacks V(xs) into m functional arguments for c1, c2, c3.

see this post.


Hello, I’m running through the lesson 6 rnn notebook and am noticing that my Multi-output model is getting significantly different results than the same code in the video.

You’ll notice below, after the first fit method, my val_loss ranges from 2.4 to 2.0 and after the last fit it goes down to 1.99 (which is worst than even the initial rnn implementation). that compares to val_loss from 0.95 to 0.6 in the video.

could there be something in my setup that could be causing the difference in performance? Seems like a large enough variance that there must be something funny going on.


m = CharSeqRnn(vocab_size, n_fac).cuda()
opt = optim.Adam(m.parameters(), 1e-3)

it = iter(md.trn_dl)
*xst,yt = next(it)

def nll_loss_seq(inp, targ):
    sl,bs,nh = inp.size()
    targ = targ.transpose(0,1).contiguous().view(-1)
    return F.nll_loss(inp.view(-1,nh), targ)

fit(m, md, 4, opt, nll_loss_seq)
epoch      trn_loss   val_loss   
    0      2.600039   2.412278  
    1      2.295203   2.205402  
    2      2.143977   2.093502  
    3      2.046338   2.015161  


set_lrs(opt, 1e-4)

fit(m, md, 1, opt, nll_loss_seq)
epoch      trn_loss   val_loss   
    0      1.996949   1.999792  

1 Like

I was just listening to this again and I was curious about the autoencoder described around 38m that won the insurance competition? I looked for a recent competition and I’m guessing it’s the Allstate claims severity based on the description and timing but I didn’t see any reference to it in the first place winner’s article:

I’m guessing I’ve got the wrong competition @jeremy? I’d love to see the link to the solution described in the lecture.

If anyone gets KeyError: 'ffmpeg' running the SGD notebook (in the animation cell) you probably need to install ffmpeg. In Ubuntu (Paperspace machine) I did:

sudo apt-get update
sudo apt install ffmpeg

Then restarted kernel and it worked. :slight_smile:


@even Porto Seguro Winning Solution -- Representation learning


I’m facing the same issue. If anyone could pinpoint what is causing the differences in output loss values it would be of great value.


Did anyone try the test function on CharSeqRnn model? In Jeremy’s class, we have seen the test model get_next function on CharLoopConcat model where there was only one output. However, for multi-output case, say CharSeqRnn, there are 8 outputs. I adapted the get_next function to retrieve the indices of all of the 8 outputs by passing it through torch.max and generated the subsequent characters. However, I see gibberish. I’m not sure if I messed up the get_next or is it because of insufficient training.

Did anyone try this? If yes, could you point me to what am I doing wrong? The code I’m using is the following.

def get_next(inp):
    arr = T([char_indices[x] for x in inp])
    p = m(*V(arr))
    # Now, we have 8 outputs. Hence, the max is taken along the first axis to get 8 outputs (8 chars)
    i = np.argmax(to_np(torch.max(p, 1))[0], 1)
    return i

I followed this for Windows environment and then restarted kernel. :slight_smile:

1 Like

Do you mind sharing the losses as well? Jeremy mentioned that the loss dropped from 1.30 to 1.25 then it started making sense. Try to train the model for another hour or two.

Hi @binga,

Your adapted get_next function looks like it’s correctly retrieving and displaying the highest probability 8-char output sequence.

I think it’s actually correct that the first few characters look like gibberish because each n-th output character is trained on the first (n-1) characters in the input sequence.

Said differently, for an input sequence:

[40, 42, 29, 30, 25, 27, 29, 1]

the training label is off-set by one character:

[42, 29, 30, 25, 27, 29, 1, 1]

and the output probabilities are learned from whatever sequence of characters that have run through the RNN previously on that particular forward pass.

So for the 1st output, it’s trying to learn the label 42 from the first and only char input of 40

For the 2nd output, it’s trying to learn 29 from 40, 42

and on so…until the 8th output learning 1 from 40, 42, 29, 30, 25, 27, 29, 1

The earlier characters in the output are gibberish because they don’t have as much of the training sequence to learn from as the later characters.

If you look at just the last characters of each get_next() sequence in your CharSeqRnn test outputs, you’ll note that they match up with the get_next() test results earlier in the notebook from single-output models:

EDIT: sorry about the earlier delete & restore. I thought I understood this correctly but wanted to revisit the lectures to make sure…so now I’m slightly more sure? :slight_smile: Here’s the new stuff I learned:

I think this happens for this particular model because the hidden state is reset to zero at the start of each forward pass:

def forward(self, *cs):
        h = V(torch.zeros(1, bs, n_hidden))

In the lesson 6 video, someone asks a question about this very problem at the 2:06 mark and Jeremy mentions a solution is introduced in lesson 7.

EDIT 2: here’s the section in lesson 7 video where Jeremy explains this problem of throwing away the accrued hidden activations (h) between each 8-char minibatch segment and introduces how Back Propagation Through Time (BPTT) solves this and its related wrinkles:


I have a question about the 3 char model. I notice that the sequence generator does not all possible sequences. Here are the first two sequences:

40, 42, 29 -> 30
30, 25, 27 -> 29

There were two other possible sequences between the above sequences that are not used:

42, 29, 30 -> 25
29, 30, 25 -> 27

Generally I generate sequences with something like an optimized version of the following:

x = []
y = []
for i in range(len(idx)-cs + 1):

This ensures that every possible sequence is included. Is there some reason why half of the sequences were omitted? Or was this an accidental oversight?

EDIT: I may have asked the question a bit too early. It looks like the next section when generating 8 characters uses a method similar to my suggestion.

I have a question regarding “unsupervised learning defined as fake tasks of supervised learning” which is strictly related to embeddings each categorical feature individually.
When you train an auto-encoder you have a very clear task which is reconstructing the original data thus it is supposed to be a good example of unsupervised learning without fake tasks. Augmentation is used for generalizing better e.g. denoising auto-encoders. Also the restriction of having smaller intermediate layers can be solved by using sparse auto-encoders or variational ones.
I see one advantage of auto-encoders is that we can have the middle layer (the code) as embedding of the whole datum. While it is obviously not efficient during inference because you need to feed-forward the first half of the network, it is a way to embed all of the numerical and categorical features in a single vector. This result would not be possible with standard entity embedding without going through one embedding layer for each feature before to mix them together? Wouldn’t it?

What advise do you have for using auto-encoders for embedding with respect to the matrix techniques taught in the course?

Thank you very much.

Seems that discussion in Lesson 6 is much lesser compare to other lesson? I have stuck in the last part and would really love some help…

I would love to ask what files are in the training path and validation path? i didn’t see any code in notebook try to produce these files. I did see in the lecture video there is a trn.txt in Jeremy’s directory. I have searched for quite a while and would love to know what is this file.

FILES = dict(train=TRN_PATH, validation=VAL_PATH, test=VAL_PATH)

As earlier we create a validation set by doing get_cv_idx, I wonder how do we construct the validation set for the state model part with
LanguageModelData.from_text_files(PATH, TEXT, **FILES, bs=bs, bptt=bptt, min_freq=3)


You probably got this before the post but for anyone else who finds his or her way here, just export the first 80% of nietzsche.txt as trn.txt in TRN_PATH and the last 20% as val.txt in VAL_PATH

1 Like

Hey everyone, I have a problem with the ‘CharLoopConcatModel’ model from this lesson (I’ve been reimplementing it myself). Namely - it’s performance is a lot worse than the performance of the ‘addition’ model.

And I don’t really understand why. Especially given that if I literally copy the model code from the original notebook and paste it into my notebook and run it - it’s also experiencing the same performance problems. Which leads me to believe that problem may lay not in the model, but in the way I organize/preprocess data before. At the same time the ‘addition’ model and model that uses the nn.RNN layer perform ok (though not as well as in the original notebook)

I’d really appreciate the help with understanding where the problem lays.

You can find the code of my notebook (with example run data) here:

how should I access pytorch Variables in 0.4.0?
for example, before 0.4.0:

some_variable = m.ib(V(topMovieIdx))

I mean, it still works under 0.4.0, but I was wonder is there a new way to approach this?

EDIT: I think V needs to be replaced by T?

Just a heads up, I downgraded pytorch, because I got cuda errors (RuntimeError: CUDNN_STATUS_EXECUTION_FAILED) while implementing “RNN with pytorch”.

Hi, @sayko

As of pytorch 0.4.0, Variable is deprecated. Variable was merged into Tensor.
You can enable backprop or auto differentiation by set Tensor(data, requires_grad=True).

1 Like