Sure thing! So in regards to what this will presume, you know of the MixedDL here
So now let’s go through our steps.
- We’ll build some Tabular DL’s and vision DL’s we wish to make for our
MixedDL. - When we get to the Tabular portion, we will want to calculate the embedding matrix size. We do this with
get_emb_sz(to)(with thetoobject beingdl.trainon theTabularDL) - We’ll make a Tabular Embedding only model, as this is all we want. this code looks like so:
class TabularEmbeddingModel(Module):
"Basic model for tabular data."
def __init__(self, emb_szs, embed_p=0.):
ps = ifnone(ps, [0]*len(layers))
self.embeds = nn.ModuleList([Embedding(ni, nf) for ni,nf in emb_szs])
self.emb_drop = nn.Dropout(embed_p)
def forward(self, x_cat, x_cont=None):
if self.n_emb != 0:
x = [e(x_cat[:,i]) for i,e in enumerate(self.embeds)]
x = torch.cat(x, 1)
x = self.emb_drop(x)
return x
All this model does is take our input (which must be a tabular cat+cont if we’re following that example) (if there is no continuous it passes in an empty tensor)
So now we can build our model by passing in the emb_sz
- Now we need our vision model. Both of these models can be thought of as “bodies”, and we’ll make a head for them all. So for our Vision model, we’ll call
create_body(resnet50)and this is the body of our model - Now we get to the meat and potatoes. We have two bodies at this point, we need to make it into a cohesive model. First thing we want to do is concatenate their outputs before passing it to some head. But how do we calculate this? We’ll take both our models and call
num_features_model(model). For instance a resnet50 will have 2048. We’ll pretend our other model has an output of 224. As a result, post concatenation we can presume the size would be 2048+224 - Now we can call
create_head(2048+224, num_classes)to create our head. Finally, we need to define a model. This model should accept both of our bodies as an input, calculate a head, and then in the forward function take care of everything:
class MultiModalModel(Module):
def __init__(self, tab_body, vis_body, c):
self.tab, self.vis = tab_body, vis_body
nf = num_features_model(self.tab) + num_features_model(self.vis)
self.head = create_head(nf*2, c)
def forward(self, *x):
cat, cont, vis = x
tab_out = self.tab(cat, cont)
vis_out = self.vis(vis)
y = torch.cat((tab_out,vis_out), dim=1)
y = self.head(y)
return y
And now we have a model that can train based on our inputs!
Now of course if you wanted to use transfer learning and differential learning rates on that resnet, your splitter should split based on the layer names (self.vis vs everything else)
This help? 
