A walk with fastai2 - Tabular - Study Group and Online Lectures Megathread

Z - you cover the standard fastai tabular as well as xgboost and random forests. Do you ever cover exporting trained embeddings to the tree methods? Could always export the embeddings and recreate the full data layer outside fastai, but seems like there should be a way to hijack the fastai feed into a prepped array for xgboost training.

@ralph No I don’t, though it’s been done by someone on the forums. Also you can now represent xgboost and RF’s as NN’s so there’s a potential there too :wink: https://t.co/VyJvexZe2e?amp=1

1 Like

I’m not sure as I’ve never tried before, but I’d assume that would be fine. It’s the same concept as text data in a way where we train a LM on all the data and then fine tune it

1 Like

What is an LM?

I am going to try to explain a little better, I think that my previous explanation was not good at all.

The project that I did was comparing several architectures for segmentation. I separated my data in training and validation (this data is representative of the real one).

For each architecture I selected hyperparameters (for example weight decay). For selecting hyperparameters I used a RandomSplitter applied just to train data.

After selecting hyperparameters, I trained a model in full training set and I passed the validation folder as validation loader. Here I trained maximizing Dice.

After all architectures were trained with best hyperparameters, I choose the one with highest Dice accuracy.

Yes, I did understand what you meant :slight_smile: LM = Language Model. It follows (somewhat) the same concept in some ways, so I think you’d be fine (in terms of some thing representative of your dataset)

1 Like

Thank you very much for the course, the patience in the forums and your help! It is nice to have you in the forums!

I tried

procs = [Categorify,Normalize]
to_test = TabularPandas(df_test, procs=procs, cat_names=features)
dl_test= to_test.dataloaders(bs=512)
learn.get_preds(dl=dl_test)

But it didn’t work

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/anaconda3/envs/proyecto5/lib/python3.7/site-packages/fastai2/learner.py in one_batch(self, i, b)
    160             if len(self.yb) == 0: return
--> 161             self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
    162             if not self.training: return

~/anaconda3/envs/proyecto5/lib/python3.7/site-packages/fastai2/layers.py in __call__(self, inp, targ, **kwargs)
    293         if self.flatten: inp = inp.view(-1,inp.shape[-1]) if self.is_2d else inp.view(-1)
--> 294         return self.func.__call__(inp, targ.view(-1) if self.flatten else targ, **kwargs)
    295 

~/anaconda3/envs/proyecto5/lib/python3.7/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():

~/anaconda3/envs/proyecto5/lib/python3.7/site-packages/torch/nn/modules/loss.py in forward(self, input, target)
    931         return F.cross_entropy(input, target, weight=self.weight,
--> 932                                ignore_index=self.ignore_index, reduction=self.reduction)
    933 

~/anaconda3/envs/proyecto5/lib/python3.7/site-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
   2316         reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2317     return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
   2318 

~/anaconda3/envs/proyecto5/lib/python3.7/site-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2112         raise ValueError('Expected input batch_size ({}) to match target batch_size ({}).'
-> 2113                          .format(input.size(0), target.size(0)))
   2114     if dim == 2:

ValueError: Expected input batch_size (512) to match target batch_size (0).

During handling of the above exception, another exception occurred:

You pass in to_test directly to get_preds. TabularPandas is a DataLoader itself

@WaterKnight see the 02_Regression notebook, specifically " Inference on a test set"

What’s the reason behind the restriction of learn.predict for just one data frame row? I tend to prefer predict over get_preds because it decodes the predictions.

That’s simply what .predict is meant to do :slight_smile: you can still decode a batch of data instead (afterwards) but predict has always been predicting one item.

It’s also super inefficient to do so. Let’s remove fastai and show this in just PyTorch. If we were to run a single prediction on 100 items, here’s what it would end up like:

%%timeit
for batch in test_dl:
    with torch.no_grad():
        learn.model.eval()
        out = learn.model(*batch[:-1])

It’s time is ~595ms on average. What about batches? If I do a batch of 32 samples, it’s only 23.8ms!

So then how do I decode? We change it to:

inp,preds,_,dec_preds = learn.get_preds(dl=test_dl, with_decoded=True, with_input=True)
b = (*tuplify(inp),*tuplify(dec_preds))
dec_pandas = learn.dls.decode(b)

This will then decode it for us :slight_smile:

(and if you want the raw messy version it looks like this):


outs = []
cats, conts = [], []
for batch in test_dl:
    with torch.no_grad():
        learn.model.eval()
        cats += batch[0]
        conts += batch[1]
        outs.append(learn.loss_func.decodes(learn.model(*batch[:-1])))

cats = torch.stack(cats)
conts = torch.stack(conts)
outs = torch.cat(outs, dim=0)
b = (*tuplify((cats, conts)), *tuplify(outs))
dec_pandas = learn.dls.decode(b)

(Why use raw messy version? Saves about half the time (31.9ms vs 71.5ms)

Though soon to be not messy, stay tuned

Thank you, I am going to look at it!

Wow, I did not know that tuplify thing. Thank you so much @muellerzr!!!

I am trying to combine tabular and image data following this [example](# https://discuss.pytorch.org/t/concatenate-layer-output-with-additional-input-data/20462/2).

My main problem is that in the example for the additional data they just pass the raw tensor. I would like, instead, to pass the embeddings of the tabular learner before they go to the fc layers. I would then concatenate and then do the fc. Any ideas on how to adapt this code for fastai?

class MyModel(nn.Module):
def __init__(self):
    super(MyModel, self).__init__()
    self.cnn = models.inception_v3(pretrained=False, aux_logits=False)
    self.cnn.fc = nn.Linear(
        self.cnn.fc.in_features, 20)
    
    self.fc1 = nn.Linear(20 + 10, 60)
    self.fc2 = nn.Linear(60, 5)
    
def forward(self, image, data):
    x1 = self.cnn(image)
    x2 = data
    
    x = torch.cat((x1, x2), dim=1)
    x = F.relu(self.fc1(x))
    x = self.fc2(x)
    return x

model = MyModel()

batch_size = 2
image = torch.randn(batch_size, 3, 299, 299)
data = torch.randn(batch_size, 10)

output = model(image, data)

1 Like

Sure thing! So in regards to what this will presume, you know of the MixedDL here

So now let’s go through our steps.

  1. We’ll build some Tabular DL’s and vision DL’s we wish to make for our MixedDL.
  2. When we get to the Tabular portion, we will want to calculate the embedding matrix size. We do this with get_emb_sz(to) (with the to object being dl.train on the Tabular DL)
  3. We’ll make a Tabular Embedding only model, as this is all we want. this code looks like so:
class TabularEmbeddingModel(Module):
    "Basic model for tabular data."
    def __init__(self, emb_szs, embed_p=0.):
        ps = ifnone(ps, [0]*len(layers))
        self.embeds = nn.ModuleList([Embedding(ni, nf) for ni,nf in emb_szs])
        self.emb_drop = nn.Dropout(embed_p)

    def forward(self, x_cat, x_cont=None):
        if self.n_emb != 0:
            x = [e(x_cat[:,i]) for i,e in enumerate(self.embeds)]
            x = torch.cat(x, 1)
            x = self.emb_drop(x)
        return x

All this model does is take our input (which must be a tabular cat+cont if we’re following that example) (if there is no continuous it passes in an empty tensor)

So now we can build our model by passing in the emb_sz

  1. Now we need our vision model. Both of these models can be thought of as “bodies”, and we’ll make a head for them all. So for our Vision model, we’ll call create_body(resnet50) and this is the body of our model
  2. Now we get to the meat and potatoes. We have two bodies at this point, we need to make it into a cohesive model. First thing we want to do is concatenate their outputs before passing it to some head. But how do we calculate this? We’ll take both our models and call num_features_model(model). For instance a resnet50 will have 2048. We’ll pretend our other model has an output of 224. As a result, post concatenation we can presume the size would be 2048+224
  3. Now we can call create_head(2048+224, num_classes) to create our head. Finally, we need to define a model. This model should accept both of our bodies as an input, calculate a head, and then in the forward function take care of everything:
class MultiModalModel(Module):
    def __init__(self, tab_body, vis_body, c):
        self.tab, self.vis = tab_body, vis_body
        nf = num_features_model(self.tab) + num_features_model(self.vis)
        self.head = create_head(nf*2, c)

    def forward(self, *x):
        cat, cont, vis = x
        tab_out = self.tab(cat, cont)
        vis_out = self.vis(vis)
        y = torch.cat((tab_out,vis_out), dim=1)
        y = self.head(y)
        return y

And now we have a model that can train based on our inputs!

Now of course if you wanted to use transfer learning and differential learning rates on that resnet, your splitter should split based on the layer names (self.vis vs everything else)

This help? :slight_smile:

Hi all. Just finished up the bayesian optimisation lecture. It seems almost too good like a to be true in terms of hyper parameter tuning, almost like a free lunch.

Q: Is there any reason why I wouldn’t want to do it to find my optimal hyperparameters?

For tabular, sure it’s quick. But with other applications it can take hours to days to finish/find the optimum. This is why we have Lr finder, etc

Ah gotcha @muellerzr thank you. I was reading and saw it could suffer runtimes but wasn’t sure what types of problem it would struggle with.

I’ve been playing around with the house price kaggle table dataset and have gotten up to the test dataloader but I’m having issues

dl = learn.dls.test_dl(test)

The error
AssertionError: nan values inBsmtFinSF1 but not in setup training set

After reading around the forum I understand that my test data has missing data in that column but the training one doesn’t. Is there a way to process my test data to accommodate for this difference between the two datasets?

I was thinking of adding a row of blanks to the training set so that when creating the TabularPandas with the training set it would apply the preprocessing.

Any thoughts/solutions?

NOTE: I haven’t included the other code as it’s almost identical to the tabular examples from the lectures.

No. That won’t work well in my experience, if the model uses it and it’s not there, then it can’t make sense of the data. So you cannot use that particular input value. If it’s feature engineered you need to derive this feature in your test data as well