Conv1d on tabular data

feribg · March 14, 2019, 8:55pm

I’m trying to add a conv1d with a tabular data bunch but i get stuck on getting the dimensions right.
The attached model errors out with

Expected 3-dimensional input for 3-dimensional weight [128, 248, 3], but got 2-dimensional input of size [128, 248] instead

I’m not sure why conv1d adds a third dimension of size kernel_size

mnpinto · March 14, 2019, 9:45pm

Notice that Conv1d expects an input of dims [batch size, number of channels, length] the kernel size of 3 won’t change the output dimension if you add padding of 3//2. If you just have 1 channel you can do an input.view(128, 1, 248) but the output will be (128, 128, 248) as you have selected 128 output channels. Are your inputs time-series or something similar?

You can find more details on pytorch documentation: https://pytorch.org/docs/stable/nn.html

feribg · March 14, 2019, 9:56pm

Not really my inputs are tabular features but i want to try a CNN over them. So basically i have 248 columns of which 199 are cont and 49 is an embedding. I then want to use a conv1d as the first layer of the network. I copied the TabularModel and added the following in the beginning:

layers = []
        layers.append(conv_layer(ni=1, nf=248, ks=3,  bias=None, stride=1, padding=None, is_1d=True))
        for i,(n_in,n_out,dp,act) in enumerate(zip(sizes[:-1],sizes[1:],[0.]+ps,actns)):

I haven’t added dilation yet either but i want to try a couple of conv1d layers with dilation and then the typical couple of dense layers after that. It’s also extra confusing because of the fact that model.summary doesn’t work if you have your dimensions wrong (which totally defeats it’s purpose) and torchsummary doesn’t seem to work with a fastai tabular model (that takes both cont and cat features as parameters for forward).

I tried adding the view code you mentioned as well as unsqueeze(1) and they both result in the same error at the BN layer, which is rather weird since i don’t see 2 layers after eachother with those dimms (attached)

running_mean should contain 248 elements not 200

Would be great if you can point me in the right direction to at least debug properly which dimensions are off and how to see that within fastai.

mnpinto · March 14, 2019, 10:11pm

You can use python debugger

from IPython.core.debugger import set_trace

Then you write set_trace() in the forward of the model (or any other place you need to debug) and you can go line by line and print the sizes.

But if you are using a 1d convolution kernel over your inputs you are computing linear combinations of nearby features. Unless your features have some particular order that is important (like in a time-series) I don’t see how that can help. The regular linear layer combines all the features together.

feribg · March 14, 2019, 10:43pm

Thanks for the suggestion I’ll try the debugger. It’s rather suprising to me too, why would that be the case but several people reported that switching from dense to cnn got them a boost from ~86 to ~89/90 ROC in this competition: https://www.kaggle.com/c/santander-customer-transaction-prediction/discussion
So i just wanted to try and replicate that within fastai. I tried a lot of dropout after the first FC layer as a substitution for that but it didn’t help the overfitting too much. Maybe I should try dropout on the inputs and see if that actually helps more.

Even · March 15, 2019, 4:37am

Hey Feras, can you link to the actual discussion? That sounds interesting, but I poked around and didn’t see anything similar.

Highly recommend set_trace debugging. It’s really useful for understanding layer sizes and where things are going wrong and you gain a much deeper understanding of the network that way.

feribg · March 15, 2019, 4:57am

Sure, I think this mentions it, but I did see it in a few more places mentioned as well. I’m not sure if the idea is to act more as a regulariztion (my thought, by sampling a lesser amount of the features), or the idea is more so to pick up some spacial interaction between features (but then my guess is that would be picked up by the dense layers anyways and it would just push the rest of the irrelevant weights closer to 0). It’s a very weird competition, where the mean you get with GBM is about 0.9 and nothing seemingly works.