Part 2 Lesson 10 wiki

Ducky · April 10, 2018, 12:25am

trn_samp = SortishSampler(trn_clas, key=lambda x: len(trn_clas[x]), bs=bs//2)
val_samp = SortSampler(val_clas, key=lambda x: len(val_clas[x]))

Why is trn_samp a SortishSampler while val_samp is a SortSampler?

sjcho · April 10, 2018, 12:41am

We want augmentation effect during training, but want consistent evaluation during validation.

Ducky · April 10, 2018, 12:48am

I thought the samplers just controlled which order the data showed up. How is it an augmentation?

sjcho · April 10, 2018, 12:55am

Sorry, augmentation was not correct term. It is more like shuffling, so that each epoch will see different sequence of training samples - while getting performance benefit of using sorted sample sequences.

Ducky · April 10, 2018, 1:06am

Hmm. The DataLoader class has a shuffle parameter in addition to the sampler parameter, so while these samplers might happen to shuffle, I doubt that is the primary motivation.

I thought it was a performance thing.

But even if it does shuffle the validation dataset, so what? I can imagine that you don’t want to shuffle the test set (though I can’t think of an example why at the moment), but what’s wrong with shuffling the validation set?

sjcho · April 10, 2018, 1:12am

“Sorting” is a performance thing. But if we just sort in training, it will generate exact same batch every time. So we need ‘Sortish’ in training to get benefit of shuffling. Just ‘shuffling’ only will not do ‘sort’.
It is okay to shuffle in validation set, but no point of doing it since it will yield same result. But sorting will give performance boost.

Ducky · April 10, 2018, 1:23am

So to go back to my original question: if it is okay to shuffle in the validation set, and it gives a performance boost, why not do it on the validation set as well as the training set?

fmichaelkunz · April 10, 2018, 1:27am

In my professional life, I have started with a small sample of hand-labeled observations, built a scoring engine and then used a technique known as pseudo labeling to increase the size of the data set. This takes a lot of time, especially if you have like 200 categories.

sjcho · April 10, 2018, 1:29am

SortSampler will generate minimum padding.
SortishSampler adds randomness in sorting, so it will generate a bit more padding.
It will be more clear if you check the code for SortishSampler.

Ducky · April 10, 2018, 2:52am

Aaaand the more padding is bad how?

I have looked at the source, but I don’t understand why you wouldn’t want to SortishSampter it for the validation.

Even · April 10, 2018, 4:22am

Padding is essentially wasted computation. We have to do it because the net requires tensors of the same shape, but the closer they are in size the less computation is wasted.

jeremy · April 10, 2018, 5:14am

That would be awkward, since we often use the validation set predictions in our code, and it’s nice to know what rows they connect to.

Note that shuffle is ignored if sampler is set. The purpose of SortishSampler is to shuffle in a way that doesn’t waste computation - check out last week’s lesson for the details.

narvind2003 · April 10, 2018, 5:13pm

Hello,

What is a clean way to manually evaluate the language models and get predictions for learner.model[0] ?

Given a series of tokens, the need is to get the output of the AWD lstm (i.e, the embedding layer + the 3 lstms in rnns module) in the form of a single 400 dim tensor prediction, that encodes the tokenized input phrase.

I’m able to get the output from learner.model[0].encoder but not the whole enc.

I’m missing something basic.

jeremy · April 10, 2018, 5:43pm

Can you explain more? What are you trying to do here, and how are you planning to do it?

narvind2003 · April 10, 2018, 5:52pm

Sure, Jeremy.

I first loaded our encoder weights using load_encoder(‘lm2’)

I am trying to manually step through(forward pass) the layers of the LM using a single example (bs=1). This is similar to how we did manual predictions in lessons 4 and 6.

We don’t have a numericalize method now. So I manually tokenized my sentence into a rank1 tensor.
I am able to make a variable using V and pass it to the embedding layer of the LM like so:
m[0].encoder(V(T(tok_inp)))
This gives me 400 dim embeddings for each of my input tokens.
How do I step through further into the model and get the output of the rnns, that will be a compact representation of all these 400 dim tokens?

I am trying to get a phrase embedding at the end of this (without passing it to the decoder, which will turn it into vocab sized predictions).

jeremy · April 10, 2018, 6:20pm

Just pop a break-point in the forward method of the module you’re interested in stepping through.

narvind2003 · April 10, 2018, 6:24pm

Sure. But if I have to do it without a break point, I have to get the children of the model till that point and make a new module right?

jeremy · April 10, 2018, 6:26pm

You can use a forward hook - that’s a bit easier.

narvind2003 · April 10, 2018, 6:26pm

Yes! Trying now!

narvind2003 · April 10, 2018, 7:34pm

Registered forward hook and printed output. Worked like a charm. Removing hook was a bit scary…but now I see the beauty of pytorch!