Hi everyone, and thanks for all the work you put into this.
I am trying to do regression with the TabularList model. I have it working in the sense that it seems to train on data, validate and make predictions; however, it was giving very strange predictions. So to debug it, I reduced the model have one output label (called velocity
) and one input feature (called velocity_2
) which is identical to the output. Still the model gives wrong predictions, even though it’s a perfectly linear relationship.
Examinemus!
First a look at the input data.
print('Train shape:', train_df.shape)
print('Train head:\n', train_df.head())
print('Train tail:\n', train_df.tail())
print('Train correlations:\n', train_df.corr())
gives
Train shape: (29925, 2)
Train head:
velocity velocity_2
0 0.574803 0.574803
1 0.574803 0.574803
2 0.574803 0.574803
3 0.574803 0.574803
4 0.574803 0.574803
Train tail:
velocity velocity_2
4341 0.590551 0.590551
4342 0.354331 0.354331
4343 0.511811 0.511811
4344 0.527559 0.527559
4345 0.307087 0.307087
Train correlations:
velocity velocity_2
velocity 1.0 1.0
velocity_2 1.0 1.0
As expected the correlations are perfectly linear. Then I create and train the model:
# Split data into train and validate sets.
valid_idx = range(len(midi_df) - len(validate_df), len(midi_df))
continuous_names = ['velocity_2']
dep_var = 'velocity'
y_range = range(-1, 1)
procs = [Categorify, Normalize]
data = (TabularList.from_df(midi_df, path=data_folder, cat_names=[], cont_names=continuous_names, procs=procs)
.split_by_idx(valid_idx)
.label_from_df(cols=dep_var, label_cls=FloatList)
.databunch())
learn = tabular_learner(data, layers=[200, 100], emb_szs={}, y_range=y_range, metrics=exp_rmspe)
learn.fit_one_cycle(1, 1e-2)
giving me
epoch train_loss valid_loss exp_rmspe
1 0.053190 0.032207 0.123913
I can also make predictions:
predictions, targets = learn.get_preds(DatasetType.Valid)
print('Predictions: ', predictions)
print('Expected: ', targets)
giving me
Predictions: tensor([[-5.1767e-01],
[-5.1767e-01],
[-4.1593e-01],
...,
[-2.0785e-03],
[-4.3088e-04],
[-3.8314e-04]])
Expected: tensor([-0.4961, -0.4961, -0.4016, ..., 0.1181, 0.2598, 0.2756])
But surely with 30k samples of a perfectly linear relationship it should have a near-zero loss? I can only assume I am doing something wrong, but cannot figure out what, precisely. That’s where you come in! Any and all help would be much appreciated.
I’m using fastai v1.0.37.
$ pip3 list | grep fastai
fastai 1.0.37
Thanks!