Tabular Transfer Learning and/or retraining with fastai

I’ve been wondering the same thing , so i did some research but nothing i come across seems to show a clear way to do transfer learning for tabular.
Did you manage to find any further resources on the topic ?

Hi Sylvain, I’ve did a lot of progress on the tabular transfer learning. However, there are significant differences between text, vision and tabular in terms of layers. I would like to know if I need to transfer more than the embeds in the module list…

In fast.ai text, the function load_pretrained() contains several elements we are transferring from the old state_dict() to the new state_dict() :

  • 0.encoder.weight
  • 1.decoder.bias’
  • 1.decoder.weight

We get those, for example, through:
dec_bias, enc_wgts = wgts.get('1.decoder.bias', None), wgts['0.encoder.weight']

On the Adult Dataset Tabular Example, here are the layers I get from state_dict . We can see that they do not match the layers.bias llike in text :

embeds.0.weight
embeds.1.weight
embeds.2.weight
embeds.3.weight
embeds.4.weight
embeds.5.weight
embeds.6.weight
embeds.7.weight
embeds.8.weight
bn_cont.weight
bn_cont.bias
bn_cont.running_mean
bn_cont.running_var
bn_cont.num_batches_tracked
layers.0.weight
layers.0.bias
layers.2.weight
layers.2.bias
layers.2.running_mean
layers.2.running_var
layers.2.num_batches_tracked
layers.3.weight
layers.3.bias
layers.5.weight
layers.5.bias
layers.5.running_mean
layers.5.running_var
layers.5.num_batches_tracked
layers.6.weight
layers.6.bias

TabularModel(
(embeds): ModuleList(
(0): Embedding(10, 6)
(1): Embedding(17, 8)
(2): Embedding(17, 8)
(3): Embedding(8, 5)
(4): Embedding(16, 8)
(5): Embedding(7, 5)
(6): Embedding(6, 7)
(7): Embedding(3, 3)
(8): Embedding(43, 10)
)
(emb_drop): Dropout(p=0.0)
(bn_cont): BatchNorm1d(5, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(layers): Sequential(
(0): Linear(in_features=65, out_features=200, bias=True)
(1): ReLU(inplace)
(2): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): Linear(in_features=200, out_features=100, bias=True)
(4): ReLU(inplace)
(5): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): Linear(in_features=100, out_features=2, bias=True)
)
)

And the layers bias do not match at all any structure, for example :
‘layers.6.bias’, tensor([ 0.1803, -0.2174]

So again my question would be, which other layers do I need to transfer ?

Will share my code as soon as I’ve fully rewritten functions !

1 Like

In a language model, the decoder is tied with the encoder (the embeddings used for coding are used to decode after softmax). There is nothing like this in a regular tabular model from the fastai library, you would have to write the equivalent yourself.

1 Like

I’m facing a similar problem with transfer learning of embeddings. I’ve taken the approach of copying the tensor values from an embedding to a CSV file and reloading them into a new embedding which may have some different categories. I’m still having a problem freezing and unfreezing them, but otherwise it seems to work. Here’s what I have so far (I would appreciate ANY critique on the approach or the code itself.)

 import csv
 def write_encoding_dict(filename,df,cat,input_embeds):
     embeds=input_embeds.cpu()
     source_vocab= df[cat].astype('category').cat.categories.values
     with open(filename, 'w') as csvFile:
         writer = csv.writer(csvFile, lineterminator='\n')
         for i in range(len(source_vocab)):
             myvals = np.array(embeds(torch.tensor(i))).tolist()
             writer.writerow([source_vocab[i],*myvals])
         csvFile.close()

In my model, I want to save the first embedding variable, and I do it like this:

write_encoding_dict(‘embedding0.csv’,panda_dataframe,category_var0, learn.model.embeds[0])

Then the file contains rows of “class,embeddings value list” like this:

ACE,-0.00013918841432314366, 3.610396379372105e-05, -7.69308189774165e-06, -2.2517966499435715e-05, -2.284333822899498e-05

Then to read them back in and load the embedding values into a different model:

 def get_encoding_dict(filename):
     with open(filename, 'r') as csvFile:
         reader = csv.reader(csvFile)
         lines = list(reader)
         d = OrderedDict()
         for i in range(len(lines)):
             d[lines[i][0]] = [float(lines[i][j]) for j in range(1,len(lines[i]))]
         csvFile.close()
         return d
 
 def load_embed_weights(df, cat, embeds, file):
     encodings = get_encoding_dict(file)
     target_vocab = df[cat].astype('category').cat.categories.values
     weights_matrix = embeds.weight
     #weights_matrix.requires_grad = False
     emb_dim=weights_matrix.shape[1]
     words_found = 0
     for i, word in enumerate(target_vocab):
         try: 
             enc = encodings[word]
             for j in range(emb_dim):
                 weights_matrix[i][j] = enc[j]
             words_found += 1
         except KeyError:
             for j in range(emb_dim):
                 weights_matrix[i][j] = np.random.normal(scale=0.6)
     print(weights_matrix.shape[0], words_found)

So - seems to work. The problem I’m having is when I try to freeze the weights in the new model, like this:

weights_matrix.requires_grad = False

I get an error that I can’t freeze a non-leaf node. So when I try to freeze the embedded tensor directly, like this:

weights_matrix.data.requires_grad = False

I get a different error that the optimizer can’t optimize a non-leaf variable.

I feel like I’ve made real progress, but this last hurdle is killing me…

Ok - I’ve figured out that this works if I don’t reset the embedding values:

model.embeds[0].weight.requires_grad = False

so the problem is how I’m doing the reset. Apparently it’s creating a dependency from my initialization value to the embedding value I’m trying to replace. hmm…

Now, If I use this wrapper to copy the weight values in:

with torch.no_grad():

I don’t get any errors. However, setting requires_grad to False isn’t having any effect. It removes the gradient from the Tensors, but the values keep adapting.

Ok, this is strange. If I turn off the gradient after loading, it doesn’t change during learning even if I turn it back on later.

learn.model.embeds[0].weight.requires_grad = False

However, if I turn it on after loading, it keeps changing even if I turn it off later. I’m stumped.

Is there some reason that setting requires_grad only works once?

Hi Jumonji, I will soon share my take on it, however I have not gotten around to freezing layers.

Well, (duh!) fastai always recreates the optimizer after freezing layers, which (I suspect) is reloading just the unfrozen parameters to be optimized. So just turning off the gradient will not automatically remove those parameters from being optimized, one must also recreate the optimizer. So I tried that and (drum roll…) it worked!

You can’t just use the built in freeze function for two reasons. First, I only want to freeze the embeddings, not all inputs in the first layer, and secondly, The tabular data model is all wrapped within a SequenceEx wrapper so its all one big layer grouping anyway. You can only freeze all or none with the built in function.

So, to freeze and unfreeze a specific embedding you must use the correct index based on the category order, like so:

learn.model.embeds[index].weight.requires_grad = False (or True)
learn.create_opt(defaults.lr)

Voia! It works!

Now I just need to understand why the weight matrix for the embedding has an extra row in it (one more than the number of classes in that category.) Any ideas?

2 Likes

Hi Jumonji, I am curious which other layers you have transferred other than embeds.[index].weights ?

To answer your question, the extra row value in each embedding is #na# which is served as a placeholder default when you try to predict a new value that is not present in your embedding dictionary .

You can see those with learn.data.train_ds.x.classes

Will soon share my code, I have asked somebody to review it beforehand.

Thanks, Jeremy. It looks like the #na# is prepended to the classes at index zero, so that’s what I’m doing now.

I have only been transferring the embeddings themselves. I’m working in the airline domain and I’m trying to come up with a generic airport encoding, starting by using destination volume analogously to word order in NLP.

Any comments on my code thus far? Here’s the latest version:

import csv
from collections import OrderedDict 

def write_encoding_dict(filename, df, cat, embeds):
    source_vals = ['#na#', *df[cat].astype('category').cat.categories.values]
    weight_matrix = embeds.weight
    with open(filename, 'w') as csvFile:
        writer = csv.writer(csvFile, lineterminator='\n')
        for i in range(len(source_vals)):
            writer.writerow([source_vals[i],*weight_matrix[i].tolist()])
        
def get_encoding_dict(filename):
    with open(filename, 'r') as csvFile:
        reader = csv.reader(csvFile)
        lines = list(reader)
        d = OrderedDict()
        for i in range(len(lines)):
            d[lines[i][0]] = [float(lines[i][j]) for j in range(1,len(lines[i]))]
        return d

def load_embed_weights(filename, df, cat, embeds):
    encodings = get_encoding_dict(filename)
    target_vals = ['#na#', *df[cat].astype('category').cat.categories.values]
    weights_matrix = embeds.weight
    emb_dim=weights_matrix.shape[1]
    vals_found = 0
    with torch.no_grad():
        for i, value in enumerate(target_vals):
            try: 
                enc = encodings[value]
                for j in range(emb_dim):
                    weights_matrix[i][j] = enc[j]
                vals_found += 1
            except KeyError:
                for j in range(emb_dim):
                    weights_matrix[i][j] = np.random.normal(scale=0.6)
1 Like

When you say you’re adding new values each day are you adding more training data (rows) or are you changing the structure of the model i.e. adding more columns?

Hi, both. Models could have new data with new categorical values that were never observed in the past (for example a new car model) or it could also be transfer the weights for a new kind of problem reusing the same rows.

I think you have to ditch your embeddings if you want to avoid retraining. If you one hot encode your categorical variables instead you should be able to add new connections to the network while preserving the existing weights and then train the model using the validation data from the original model (inference only) to re-calibrate it such that the validation loss between the original model and new model is minimised. That should retain the knowledge acquired by the original models training while expanding the model into a new model that can support new inputs. I am using a similar approach right now except I have the inverse problem. I am shrinking a GAN by 50% so I can perform real-time inference by removing entire resblocks and re-calibrating.

So steps are:

  1. Copy original model (O) to (N)
  2. Add new connections to (N)
  3. Get validation data from (O)'s previous training loop
  4. Run validation data through (O) and get outputs
  5. Train (N) on validation data and calculate MSE loss between (O) and (N) outputs

(N) should learn how to imitate (O)

1 Like

Hi @sgugger , I am happy to share with the community a basic demo of tabular transfer learning with fast.ai , thanks for pointing me in the direction of the fastai text. I am still unsure on how to handle the bias layers. I would really appreciate any help on how to modify the model architecture or layers (require_grad ?) to improve transfer accuracy , could you suggest me any paths of improvement ?

@Jumonji : with you my version of the code; as you can see, I work directly from a pickled dictionary instead of a CSV and only take care of embed weights, and not other layers yet.

You can see the model automatically starts at ~.30 loss instead of ~0.7, and everything runs+trains smoothly. I am ready to work on other problems, but I would first appreciate some feedback from anybody here !

CODE:
https://colab.research.google.com/drive/1yvA6pFPbmtwUUw1VDtPixoqWPTgkEfpM

7 Likes

Thanks @Jeremyeast - You’re solving a slightly different problem than I am - I want to use lists of category classes with their embedding vectors that possibly haven’t been created in fastai models - i.e., similar to GloVe vectors for NLP word embeddings than may be created and shared from many different sources. The CSV file format was just a start to see if I could make it work, I don’t want to depend on having a pickle model to start with. GloVe uses space-delimited records actually.

I do think I’ll try your technique of getting the class list instead of building it from Panda, just to verify my results if nothing else. Cheers!

I have done a lot of work on representing tabular entities in a 2d space. I used tsne and matplot and had some success grouping entities through DBScan (does not require to pass N clusters). Glove is probably superior but its hard to see tangible applications with this… I would love to see how you apply this for the airline industry.

You could easily take my code and extend it to add a category with a mean or empty vector value until you receive a vector that you have the data. I would recommend you to look at my code in order to pass a uniform number of columns for the category names you will want to transfer (n classes / 2, max 50).

My “final” version with no pandas dependencies. Pretty minimal, if I do say so myself:

import csv
import torch
from collections import OrderedDict 
from fastai.basic_train import Learner

defaultlr = 1e-3

def write_encoding_dict(filename, learner, cat_names, cat):
    classes = learner.data.label_list.train.x.classes[cat]
    weight_matrix = learner.model.embeds[cat_names.index(cat)].weight
    with open(filename, 'w') as csvFile:
        writer = csv.writer(csvFile, lineterminator='\n')
        for i in range(len(classes)):
            writer.writerow([classes[i],*weight_matrix[i].tolist()])
        
def get_encoding_dict(filename):
    with open(filename, 'r') as csvFile:
        reader = csv.reader(csvFile)
        lines = list(reader)
        d = OrderedDict()
        for i in range(len(lines)):
            d[lines[i][0]] = [float(lines[i][j]) for j in range(1,len(lines[i]))]
        return d

def load_embed_weights(filename, learner, cat_names, cat):
    encodings = get_encoding_dict(filename)
    classes = learner.data.label_list.train.x.classes[cat]
    weight_matrix = learner.model.embeds[cat_names.index(cat)].weight
    emb_dim=weight_matrix.shape[1]
    with torch.no_grad():
        for i, value in enumerate(classes):
            try: 
                enc = encodings[value]
                for j in range(emb_dim):
                    weight_matrix[i][j] = enc[j]
            except KeyError:
                for j in range(emb_dim):
                    weight_matrix[i][j] = np.random.normal(scale=0.6)
                    
def freeze_embedding(learner:Learner,index=0):
    learner.model.embeds[index].weight.requires_grad = False
    learner.create_opt(defaultlr)
                    
def unfreeze_embedding(learner:Learner,index=0):
    learner.model.embeds[index].weight.requires_grad = True
    learner.create_opt(defaultlr)
5 Likes

I was actually trying to do exactly the same. Thanks for being so thoughtful and sharing your codes. :+1:

Hi Guys,

Thanks for the code snippets. I am working on a similar problem to handle COVID lockdown retail consumer data - training on pre-lockdown and then fine-tuning on post-lockdown data. The problem I have is that not only was there a huge gap in the data during the lockdown, but the customers’ behaviour appears to have changed post-lockdown. I don’t want to throw away all the data from pre-lockdown so I thought transfer learning might be appropriate here. Do you have any update on the above code for fastai2? Or any advance on methods that you could share? I’m pretty new to fastai, and have been using XGBoost mostly since becoming a DS, so learning what I can from you guys :slight_smile:

Thanks for sharing the code, what is expected from the cat parameter? I am not sure what to add.