Part 2 Lesson 10 wiki


(Kaitlin Duck Sherwood) #448
trn_samp = SortishSampler(trn_clas, key=lambda x: len(trn_clas[x]), bs=bs//2)
val_samp = SortSampler(val_clas, key=lambda x: len(val_clas[x]))

Why is trn_samp a SortishSampler while val_samp is a SortSampler?


(Sukjae Cho) #449

We want augmentation effect during training, but want consistent evaluation during validation.


(Kaitlin Duck Sherwood) #450

I thought the samplers just controlled which order the data showed up. How is it an augmentation?


(Sukjae Cho) #451

Sorry, augmentation was not correct term. It is more like shuffling, so that each epoch will see different sequence of training samples - while getting performance benefit of using sorted sample sequences.


(Kaitlin Duck Sherwood) #452

Hmm. The DataLoader class has a shuffle parameter in addition to the sampler parameter, so while these samplers might happen to shuffle, I doubt that is the primary motivation.

I thought it was a performance thing.

But even if it does shuffle the validation dataset, so what? I can imagine that you don’t want to shuffle the test set (though I can’t think of an example why at the moment), but what’s wrong with shuffling the validation set?


(Sukjae Cho) #453

“Sorting” is a performance thing. But if we just sort in training, it will generate exact same batch every time. So we need ‘Sortish’ in training to get benefit of shuffling. Just ‘shuffling’ only will not do ‘sort’.
It is okay to shuffle in validation set, but no point of doing it since it will yield same result. But sorting will give performance boost.


(Kaitlin Duck Sherwood) #454

So to go back to my original question: if it is okay to shuffle in the validation set, and it gives a performance boost, why not do it on the validation set as well as the training set?


(Mike Kunz ) #455

In my professional life, I have started with a small sample of hand-labeled observations, built a scoring engine and then used a technique known as pseudo labeling to increase the size of the data set. This takes a lot of time, especially if you have like 200 categories.


(Sukjae Cho) #456

SortSampler will generate minimum padding.
SortishSampler adds randomness in sorting, so it will generate a bit more padding.
It will be more clear if you check the code for SortishSampler.


(Kaitlin Duck Sherwood) #457

Aaaand the more padding is bad how?

I have looked at the source, but I don’t understand why you wouldn’t want to SortishSampter it for the validation.


(Even Oldridge) #458

Padding is essentially wasted computation. We have to do it because the net requires tensors of the same shape, but the closer they are in size the less computation is wasted.


(Jeremy Howard) #459

That would be awkward, since we often use the validation set predictions in our code, and it’s nice to know what rows they connect to.

Note that shuffle is ignored if sampler is set. The purpose of SortishSampler is to shuffle in a way that doesn’t waste computation - check out last week’s lesson for the details.


(Arvind Nagaraj) #460

Hello,

What is a clean way to manually evaluate the language models and get predictions for learner.model[0] ?

Given a series of tokens, the need is to get the output of the AWD lstm (i.e, the embedding layer + the 3 lstms in rnns module) in the form of a single 400 dim tensor prediction, that encodes the tokenized input phrase.

I’m able to get the output from learner.model[0].encoder but not the whole enc.

I’m missing something basic.


(Jeremy Howard) #461

Can you explain more? What are you trying to do here, and how are you planning to do it?


(Arvind Nagaraj) #462

Sure, Jeremy.

I first loaded our encoder weights using load_encoder(‘lm2’)

I am trying to manually step through(forward pass) the layers of the LM using a single example (bs=1). This is similar to how we did manual predictions in lessons 4 and 6.

We don’t have a numericalize method now. So I manually tokenized my sentence into a rank1 tensor.
I am able to make a variable using V and pass it to the embedding layer of the LM like so:
m[0].encoder(V(T(tok_inp)))
This gives me 400 dim embeddings for each of my input tokens.
How do I step through further into the model and get the output of the rnns, that will be a compact representation of all these 400 dim tokens?

I am trying to get a phrase embedding at the end of this (without passing it to the decoder, which will turn it into vocab sized predictions).


(Jeremy Howard) #463

Just pop a break-point in the forward method of the module you’re interested in stepping through. :slight_smile:


(Arvind Nagaraj) #464

Sure. But if I have to do it without a break point, I have to get the children of the model till that point and make a new module right?


(Jeremy Howard) #465

You can use a forward hook - that’s a bit easier.


(Arvind Nagaraj) #466

Yes! Trying now!


(Arvind Nagaraj) #467

Registered forward hook and printed output. Worked like a charm. Removing hook was a bit scary…but now I see the beauty of pytorch!