Issues in creating cnn model using tabular databunch

vikassb · July 5, 2019, 4:07am

We have a numerical dataset .
Sample :

Then we tried to create a tabular databunch using the same .

Code:

data = TabularList.from_df(df,cont_names=cont_names).split_subsets(train_size=0.8, valid_size=0.2, seed=42).label_from_df(cols=dep_var, label_cls=FloatList).databunch()

We got the databunch as expected .

When we look at x,Y of one batch of data :

As you can see the in our x we are getting an extra tensor of 0’s .

As a result when we are trying to use this databunch in a NN ,we are getting error :

Error : forward() takes 2 positional arguments but 3 were given

Kindly explain the reason and way to correct our databunch .

muellerzr · July 5, 2019, 4:14am

Aren’t you trying to show a batch via the dataframe? If so you want .show_batch(). Else when we create a databunch for our Learner, everything is in tensors, hence what we are seeing. Am I understanding your question correctly?

vikassb · July 5, 2019, 5:32am

Hi @muellerzr ,
thanks for the reply.
i updated the question .take a look .hope this time i explained my issue in better terms .

sgugger · July 5, 2019, 12:22pm

TabularList is made to work with a tabular model that always expects two inputs (one categorical, one continuous), which is why you have this tensor of zeros (corresponding to the categorical input). Just adjust your model to forget that first tensor, or use a custom ItemList (it’s very likely FloatList as inputs would work).
This tabular thing will be more polished in v2 but for now that’s the workarounds I can think of.

vikassb · July 8, 2019, 5:01am

Thanks @sgugger .

Even i try to ignore the tensor of zeros , how gonna pass it through Conv1D as it need 3 D input .
As databunch is giving us tuples .

I tried to create databunch using the following code :

import numpy 
class ArrayDataset(Dataset):
    "Sample numpy array dataset"
    def __init__(self, x, y):
        self.x, self.y = x, y
        self.c = 2 # binary label

     def __len__(self):
        return len(self.x)

     def __getitem__(self, i):
        return self.x[i], self.y[i]
x=df.iloc[:,:-1]
Y=df.iloc[:,-1]
 #split training validation
 training_size = int(0.8* x.shape[0])
training_datas = x.iloc[:training_size,:]
training_labels = Y.iloc[:training_size]
validation_datas = x.iloc[training_size:,:]
validation_labels = Y.iloc[training_size:]


train=training_datas.to_numpy()

Y=training_labels.to_numpy()
test_Y=validation_labels.to_numpy()
test=validation_datas.to_numpy()


train_ds, valid_ds = ArrayDataset(train, Y), ArrayDataset(test, test_Y)
data = DataBunch.create(train_ds, valid_ds, bs=60, num_workers=1)

The generated databunch is 2D (as expected )but Conv1d needs 3 D input
Even after reshaping the issue remains .
With error :

TypeError: conv1d(): argument 'input' (position 1) must be Tensor, not tuple

How to convert that databunch ,so that it can be passed to Conv1d.

remapears · January 15, 2020, 5:00pm

did you find a solution to this problem?

vikassb · January 21, 2020, 11:21am

Hi @remapears,

The issue with Tabularlist is, it always expect two inputs (Categorical, continuous).
As suggested by Sgugger above, we used Custom Itemlist.