Part 2 Lesson 10 wiki

trn_samp = SortishSampler(trn_clas, key=lambda x: len(trn_clas[x]), bs=bs//2)
val_samp = SortSampler(val_clas, key=lambda x: len(val_clas[x]))

Why is trn_samp a SortishSampler while val_samp is a SortSampler?

We want augmentation effect during training, but want consistent evaluation during validation.

I thought the samplers just controlled which order the data showed up. How is it an augmentation?

Sorry, augmentation was not correct term. It is more like shuffling, so that each epoch will see different sequence of training samples - while getting performance benefit of using sorted sample sequences.

3 Likes

Hmm. The DataLoader class has a shuffle parameter in addition to the sampler parameter, so while these samplers might happen to shuffle, I doubt that is the primary motivation.

I thought it was a performance thing.

But even if it does shuffle the validation dataset, so what? I can imagine that you don’t want to shuffle the test set (though I can’t think of an example why at the moment), but what’s wrong with shuffling the validation set?

“Sorting” is a performance thing. But if we just sort in training, it will generate exact same batch every time. So we need ‘Sortish’ in training to get benefit of shuffling. Just ‘shuffling’ only will not do ‘sort’.
It is okay to shuffle in validation set, but no point of doing it since it will yield same result. But sorting will give performance boost.

1 Like

So to go back to my original question: if it is okay to shuffle in the validation set, and it gives a performance boost, why not do it on the validation set as well as the training set?

In my professional life, I have started with a small sample of hand-labeled observations, built a scoring engine and then used a technique known as pseudo labeling to increase the size of the data set. This takes a lot of time, especially if you have like 200 categories.

SortSampler will generate minimum padding.
SortishSampler adds randomness in sorting, so it will generate a bit more padding.
It will be more clear if you check the code for SortishSampler.

1 Like

Aaaand the more padding is bad how?

I have looked at the source, but I don’t understand why you wouldn’t want to SortishSampter it for the validation.

Padding is essentially wasted computation. We have to do it because the net requires tensors of the same shape, but the closer they are in size the less computation is wasted.

2 Likes

That would be awkward, since we often use the validation set predictions in our code, and it’s nice to know what rows they connect to.

Note that shuffle is ignored if sampler is set. The purpose of SortishSampler is to shuffle in a way that doesn’t waste computation - check out last week’s lesson for the details.

1 Like

Hello,

What is a clean way to manually evaluate the language models and get predictions for learner.model[0] ?

Given a series of tokens, the need is to get the output of the AWD lstm (i.e, the embedding layer + the 3 lstms in rnns module) in the form of a single 400 dim tensor prediction, that encodes the tokenized input phrase.

I’m able to get the output from learner.model[0].encoder but not the whole enc.

I’m missing something basic.

1 Like

Can you explain more? What are you trying to do here, and how are you planning to do it?

Sure, Jeremy.

I first loaded our encoder weights using load_encoder(‘lm2’)

I am trying to manually step through(forward pass) the layers of the LM using a single example (bs=1). This is similar to how we did manual predictions in lessons 4 and 6.

We don’t have a numericalize method now. So I manually tokenized my sentence into a rank1 tensor.
I am able to make a variable using V and pass it to the embedding layer of the LM like so:
m[0].encoder(V(T(tok_inp)))
This gives me 400 dim embeddings for each of my input tokens.
How do I step through further into the model and get the output of the rnns, that will be a compact representation of all these 400 dim tokens?

I am trying to get a phrase embedding at the end of this (without passing it to the decoder, which will turn it into vocab sized predictions).

Just pop a break-point in the forward method of the module you’re interested in stepping through. :slight_smile:

Sure. But if I have to do it without a break point, I have to get the children of the model till that point and make a new module right?

You can use a forward hook - that’s a bit easier.

Yes! Trying now!

Registered forward hook and printed output. Worked like a charm. Removing hook was a bit scary…but now I see the beauty of pytorch!

2 Likes