I want to train on tabular data with DL. I already tried a few Kaggle competitions like Rossman, Titanic, and Santander.
I am looking for recommends about other competitions using Tabular data. (and if you have fast.ai kernels for them as reference).
Hey Offir - how did you go on the santander data set?
There are a heap of tabular data sets around, look at the data sets available on kaggle, I think there aren’t many competitions though.
This is the link for santander: https://www.kaggle.com/c/santander-customer-transaction-prediction
if you can reccomand on more datasats it will be great
Hey, I’m working on a similar problem, maybe you all could help me figure this out: I’m trying to use a tabular learner on a wide dataset, and all the values are continuous. I keep getting a “divide by zero” error, and I think it’s due to the way that I’m creating my databunch, but not sure what exactly I’m doing wrong.
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from fastai.tabular import *
path = Path('../input')
```df = pd.read_csv('../input/train.csv')
test = pd.read_csv('../input/test.csv')
procs = [Normalize, FillMissing]
dep_var = 'target'
# valid_idx = range(len(df)-50, len(df))
columns = df.columns
ids_test = test['id']
test.drop(columns='id', inplace = True)```
```test_list = TabularList.from_df(test, procs=procs)
data = (TabularList.from_df(df, procs=procs)
learn = tabular_learner(data, layers=[200,100], metrics=AUROC())
This gives the error:
/opt/conda/lib/python3.6/site-packages/torch/nn/init.py in kaiming_uniform_(tensor, a, mode, nonlinearity)
288 fan = _calculate_correct_fan(tensor, mode)
289 gain = calculate_gain(nonlinearity, a)
--> 290 std = gain / math.sqrt(fan)
291 bound = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation
292 with torch.no_grad():
ZeroDivisionError: float division by zero```
Hey @Stephen_F, the error might be because you are not telling the model which columns contain constant values and which ones contain categorical values. You might want to add a line like
cont_names = ['age', 'gender' and so on] or since all your columns are constant you can write
cont_names = list(df.columns). Then in your learner pass that as an argument like this
@dipam7 I’ll be darned, I think that worked! I think when I tried it before I accidentally left the target variable in the list of columns and that threw me off the trail. Thanks!