Issues in creating cnn model using tabular databunch

We have a numerical dataset .
Sample :

Then we tried to create a tabular databunch using the same .

Code:

data = TabularList.from_df(df,cont_names=cont_names).split_subsets(train_size=0.8, valid_size=0.2, seed=42).label_from_df(cols=dep_var, label_cls=FloatList).databunch()

We got the databunch as expected .

When we look at x,Y of one batch of data :


As you can see the in our x we are getting an extra tensor of 0’s .

As a result when we are trying to use this databunch in a NN ,we are getting error :

Error : forward() takes 2 positional arguments but 3 were given

Kindly explain the reason and way to correct our databunch .

1 Like

Aren’t you trying to show a batch via the dataframe? If so you want .show_batch(). Else when we create a databunch for our Learner, everything is in tensors, hence what we are seeing. Am I understanding your question correctly?

Hi @muellerzr ,
thanks for the reply.
i updated the question .take a look .hope this time i explained my issue in better terms .

TabularList is made to work with a tabular model that always expects two inputs (one categorical, one continuous), which is why you have this tensor of zeros (corresponding to the categorical input). Just adjust your model to forget that first tensor, or use a custom ItemList (it’s very likely FloatList as inputs would work).
This tabular thing will be more polished in v2 but for now that’s the workarounds I can think of.

2 Likes

Thanks @sgugger .

Even i try to ignore the tensor of zeros , how gonna pass it through Conv1D as it need 3 D input .
As databunch is giving us tuples .

I tried to create databunch using the following code :

import numpy 
class ArrayDataset(Dataset):
    "Sample numpy array dataset"
    def __init__(self, x, y):
        self.x, self.y = x, y
        self.c = 2 # binary label

     def __len__(self):
        return len(self.x)

     def __getitem__(self, i):
        return self.x[i], self.y[i]
x=df.iloc[:,:-1]
Y=df.iloc[:,-1]
 #split training validation
 training_size = int(0.8* x.shape[0])
training_datas = x.iloc[:training_size,:]
training_labels = Y.iloc[:training_size]
validation_datas = x.iloc[training_size:,:]
validation_labels = Y.iloc[training_size:]


train=training_datas.to_numpy()

Y=training_labels.to_numpy()
test_Y=validation_labels.to_numpy()
test=validation_datas.to_numpy()


train_ds, valid_ds = ArrayDataset(train, Y), ArrayDataset(test, test_Y)
data = DataBunch.create(train_ds, valid_ds, bs=60, num_workers=1)

The generated databunch is 2D (as expected )but Conv1d needs 3 D input
Even after reshaping the issue remains .
With error :

TypeError: conv1d(): argument 'input' (position 1) must be Tensor, not tuple 

How to convert that databunch ,so that it can be passed to Conv1d.

1 Like

did you find a solution to this problem?

Hi @remapears,

The issue with Tabularlist is, it always expect two inputs (Categorical, continuous).
As suggested by Sgugger above, we used Custom Itemlist.

1 Like