Fine-Tuning ULMFiT


Is it possible to fine-tune ULMFiT on my own dataset and then extract embedding vectors from it?


Yes. That’s what it’s designed for. One of the alumni @muellerzr has done so in one of his posts. You can check his blog. I then think since embedding layer is the first layer of the architecture. You can extract weights from them.

1 Like

In the link given above, it tells you about how to fine tune on your own dataset. To extract embeddings, you could tweak the API in the following ways.

Solution 1:

import torch
import fastai
from fastai.text import AWD_LSTM,load_data,MultiBatchEncoder, RNNLearner

lm_data_path = ‘/home/ubuntu/efs/corpus/fine_tuning_corpus/ft_corpus_250K.csv’
output_path = ‘/home/ubuntu/efs/fp16_ulmfit/vsz_100k_dp_0.75/’

Encoder initialization

drop_mult = 1

AWD-LSTM config

config = dict(emb_sz=400, n_hid=1152, n_layers=3, pad_token=1, qrnn=False, bidir=False, output_p=0.4,
hidden_p=0.3, input_p=0.4, embed_p=0.05, weight_p=0.5)
for k in config.keys():
if k.endswith(’_p’): config[k] *= drop_mult
ps = [config.pop(‘output_p’)]

Get the AWD-LSTM encoder

encoder = MultiBatchEncoder(bptt, max_len, AWD_LSTM(vocab_sz, **config), pad_idx=1)
learn = RNNLearner(dl, encoder)

Load the fine tuned AWD-LSTM

learn.model = learn.model.module
learn.model.load_state_dict(torch.load(f’{output_path}models/fwd_enc.pth’, map_location=‘cuda’))

Get the embeddings

text = ‘APPLE is the world ’s greenest tech company’
batch =

Solution 2:

Use hook on the classifier model made using the encoder

from fastai.callbacks.hooks import hook_output
layer = learn.model[:2][1].layers[0] # the layer from which you want to extract the output
input_ds =
with hook_output(layer) as hook_forward:
preds = learn.model(input_ds[0])

executed in 3ms, finished 12:51:36 2019-12-10

1 Like

In this example post, how is the csv file formatted. Is each article in one cell of the csv?

yes, each article is one row

1 Like

Hey thanks. I wanted to know how would you proceed to calculate sentence embeddings . Would taking the context vector at the end of the last layer help or taking the sum of the embeddings from the embedding layer.