I have data that has a new bias every datapoint. For example:
def make_data(l=100, bias = 100):
return (torch.randn(l)*.1+ torch.arange(l) + torch.randn(1)*bias)[None,:]
for i in [100, 200, 300]:
x = make_data(l=11, bias = i)
x, y= x[:,:9], x[:,-1]
print(x,y)
tensor([[62.3799, 63.4259, 64.3857, 65.5097, 66.4785, 67.4576, 68.6136, 69.4785,
70.4420]]) tensor([72.4677])
tensor([[-94.4864, -93.5647, -92.4024, -91.5976, -90.7541, -89.6858, -88.5592,
-87.2961, -86.6319]]) tensor([-84.5226])
tensor([[540.6299, 541.7110, 542.7780, 543.7479, 544.9114, 545.7786, 546.7360,
547.7943, 548.6898]]) tensor([550.7460])
This is a rather simple example but is I hope you understand the idea.
It does not really help to normalize over the dataset/batch. For example:
xb, yb = [], []
for i in [100, 200, 300]:
x = make_data(l=11, bias = i)
x, y= x[:,:9], x[:,-1]
xb.append(x)
yb.append(y)
xb, yb = torch.cat(xb), torch.cat(yb)
(xb-xb.mean())/xb.std(), (yb-xb.mean())/xb.std()
(tensor([[ 0.3367, 0.3413, 0.3461, 0.3512, 0.3544, 0.3603, 0.3642, 0.3699,
0.3743],
[ 0.9650, 0.9696, 0.9749, 0.9784, 0.9840, 0.9884, 0.9937, 0.9993,
1.0025],
[-1.3572, -1.3527, -1.3492, -1.3442, -1.3401, -1.3345, -1.3304, -1.3262,
-1.3197]]), tensor([ 0.3828, 1.0130, -1.3115]))
It does help to do this per item: scale the input, run the model, and scale back to match the prediction. This really help the model to train better and there is no leakage since no y data is used to scale the prediction.
I used to do this this with a TfmdDL
in create_item
. For examle:
xb, yb = [], []
for i in [100, 200, 300]:
x = make_data(l=11, bias = i)
x, y= x[:,:9], x[:,-1]
xb.append((x-x.mean())/x.std())
yb.append((y-x.mean())/x.std())
xb, yb = torch.cat(xb), torch.cat(yb)
(xb-xb.mean())/xb.std(), (yb-xb.mean())/xb.std()
(tensor([[-1.4768, -1.1911, -0.7628, -0.3886, 0.0444, 0.3560, 0.7571, 1.1187,
1.5430],
[-1.5020, -1.2109, -0.7504, -0.3387, 0.0050, 0.4018, 0.7681, 1.1345,
1.4926],
[-1.4600, -1.1334, -0.7901, -0.3764, -0.0726, 0.3756, 0.7257, 1.1631,
1.5681]]), tensor([2.2095, 2.3414, 2.2393]))
Only this would not work well for predictions (in production or kaggle) because it is not part of encode/decode process. This also means it does not show the data correctly (only normalized). So I do not really like that option.
I’m having some problem to implementing any other solution however. I first wanted to do this with a ItemTransform
during after_batch
only it assumes the last encoded is the one that needs to be decoded. Another option is to do the scaling inside the model only this gives other weird bugs.
My question is what is the best way to handle this? Any other options I’m missing?
Since I’m aware that this is a rather confusing problem, any suggestions to make my question more clear are very much welcome.
More details are in the gist below: