Making sense of outputs of ULMFit encoder

jvanelteren · August 11, 2019, 7:54pm

My objective is to cluster similar documents. I want to use the hidden state of the encoder of a finetuned ULMFiT model as input for a clustering algorithm. I’m trying it out with the IMDB notebook.

To obtain the hidden state, I think I managed to get a callback working with snippets from the forum, but I’m struggling to understand output of the callback
Learn.model[0]:

AWD_LSTM(
  (encoder): Embedding(60000, 400, padding_idx=1)
  (encoder_dp): EmbeddingDropout(
    (emb): Embedding(60000, 400, padding_idx=1)
  )
  (rnns): ModuleList(
    (0): WeightDropout(
      (module): LSTM(400, 1152, batch_first=True)
    )
    (1): WeightDropout(
      (module): LSTM(1152, 1152, batch_first=True)
    )
    (2): WeightDropout(
      (module): LSTM(1152, 400, batch_first=True)
    )
  )
  (input_dp): RNNDropout()
  (hidden_dps): ModuleList(
    (0): RNNDropout()
    (1): RNNDropout()
    (2): RNNDropout()
  )
)

The callback:

class StoreHook(HookCallback):
    def on_train_begin(self, **kwargs):
        super().on_train_begin(**kwargs)
        self.acts = []
    def hook(self, m, i, o): return o
    def on_train_end(self, train, **kwargs): #change into on batch end when I understand the output of on_train_end
        self.acts += self.hooks.stored

The code

path = untar_data(URLs.IMDB,force_download=False)
path.ls()
data_lm = load_data(path,'tmp_lm2',bs=48)
learn = language_model_learner(data_lm, pretrained=True, drop_mult=0.5,arch=AWD_LSTM)
cb =  [StoreHook(learn, modules=flatten_model(learn.model[0]))]
learn.callbacks += cb
learn.model.eval() # to not update the weights
learn.model.reset() # don't know if this is necessary
learn.fit(1)

Understanding the output

print(f'{len(learn.callbacks[1].acts)} items in the output \n')
for i,act in enumerate(learn.callbacks[1].acts): 
    print(f'item {i}')
    if not act is None: print(act.shape)

Yields:

12 items in the output:
item 0
item 1
item 2
item 3
item 4
item 5
item 6
item 7
item 8
torch.Size([48, 70, 400])
item 9
torch.Size([48, 70, 1152])
item 10
torch.Size([48, 70, 1152])
item 11

The output of one batch is a list of 12 items, most of them NoneType

Why 12 items? It doesn’t correspond to the bs = 48?
edit: I figured this more or less out, it has nothing to do with bs, but corresponds to all the modules in learn.model[0]. The batch size can be found in the shape of the tensors
Why are most of them NoneType?
The items that have tensors have different shape, which layer is which? The decoder has 400 features as input, so is [48, 70, 400] the one I’m looking for? edit: I tried to index the separate modules to figure this out but AWD_LSTM does not support indexing when I try to hook e.g. learn.model[0][8]

Ideally I want to piece the output back to encodings per document. I’m currently at a level of understanding matching part 1 of the course, but the content of my pet project requires me to try to get this working The videos have helped in understanding callbacks, but I could surely use your help for applying it to AWD LSTM.

Daniel.R.Armstrong · August 11, 2019, 8:19pm

I would love to see it if you get it to work!

jvanelteren · August 11, 2019, 8:35pm

I’ll be sure to post! It’s been a great learning experience, although I feel I’m at sort of a wall now. So hopefully this thread can make some holes in that wall, or blow the wall apart alltogether

jvanelteren · August 18, 2019, 8:18pm

I’ve been probing further here, comparing the outputs of the AWD_LSTM (m[0]) with the inputs for the LinearDecoder (m[1]).

By modifying StoreHook I made StoreInputHook to get the inputs, like

def hook(self, m, i, o): return i

and attaching it as another callback to m[1] I could investigate.
The output of AWD_LSTM is unchanged: still a list with 12 items as described above.

The input for the LinearDecoder is a list with two hook generators (no lists this time?), each yielding exactly 1 tensor of shape torch.Size([48, 70, 400]). The contents of these two tensors are the same. Probably raw_outputs and outputs as described in the docs as input for the LinearDecoder.

I compared the content of the output tensor of torch.Size([48, 70, 400]) with the input tensor to m[1] with the same dimensions. However, the tensors are not identical!

I’m starting to think the the call to storehook with flatten_model(learn.model[0]) is not grabbing the output of the AWD_LSTM, but of some of the intermediate layers of the model, since it has 12 items and the docs mention only raw_outputs, outputs. Also when I played with the below snipped I did manage to get an identical tensor in output of m[0] and input of m[1]

def hooked_backward0(cat=y):
with hook_output(learn.model[0],detach=False) as hook_a:
    preds = m(xb)
return hook_a
hook_output_enc  = hooked_backward0()

I my reasoning correct and if so, how to modify the call to StoreHook or the class itself?