Sentence Embedding from language learner


I’ve finetuned a language model on a technical corpus, and I’d like to test sentence similarity directly after finetuning, but I’m not sure how can i access the sentence embedding from ULMFIT, I tried to read the source code for the predict function, but the only accessible output in the source code is the final softmax layer after calling pred_batch from inside the predict function.

I tried to understand what pred_batch is really doing or at least where the encoder activations (based on which the final probabilities are being calculated) but didn’t understand the code.

Can you let me know how can i retrieve the last hidden layer activations (or do you suggest any other kind of aggregation of the activations?)

Here’s the code for pred_batch and for predict function also which returns the next word or the next n_words

def predict(self, text:str, n_words:int=1, no_unk:bool=True, temperature:float=1., min_p:float=None, sep:str=' ',
        "Return the `n_words` that come after `text`."
        ds =
        xb,yb =
        new_idx = []
        for _ in range(n_words): #progress_bar(range(n_words), leave=False):
            res = self.pred_batch(batch=(xb,yb))[0][-1]
            #if len(new_idx) == 0: self.model[0].select_hidden([0])
            if no_unk: res[[UNK]] = 0.
            if min_p is not None:
                if (res >= min_p).float().sum() == 0:
                    warn(f"There is no item with probability >= {min_p}, try a lower value.")
                else: res[res < min_p] = 0.
            if temperature != 1.: res.pow_(1 / temperature)
            idx = torch.multinomial(res, 1).item()
            xb = xb.new_tensor([idx])[None]
        return text + sep + sep.join(decoder(, sep=None)))

def pred_batch(self, ds_type:DatasetType=DatasetType.Valid, batch:Tuple=None, reconstruct:bool=False, with_dropout:bool=False) -> List[Tensor]:
        "Return output of the model on one batch from `ds_type` dataset."
        if batch is not None: xb,yb = batch
        else: xb,yb =, detach=False, denorm=False)
        cb_handler = CallbackHandler(self.callbacks)
        xb,yb = cb_handler.on_batch_begin(xb,yb, train=False)
        with torch.no_grad():
            if not with_dropout: preds = loss_batch(self.model.eval(), xb, yb, cb_handler=cb_handler)
            else: preds = loss_batch(self.model.eval().apply(self.apply_dropout), xb, yb, cb_handler=cb_handler)
            res = _loss_func2activ(self.loss_func)(preds[0])
        if not reconstruct: return res
        res = res.detach().cpu()
        ds = self.dl(ds_type).dataset
        norm = getattr(, 'norm', False)
        if norm and norm.keywords.get('do_y',False):
            res =, do_x=True)
        return [ds.reconstruct(o) for o in res]

I found this topic also taking about the same problem but without any answers.

Thanks in advance.

1 Like