Where/how to normalize per item

takotab · February 18, 2020, 8:43am

I have data that has a new bias every datapoint. For example:

def make_data(l=100, bias = 100):         
    return (torch.randn(l)*.1+ torch.arange(l) + torch.randn(1)*bias)[None,:]

for i in [100, 200, 300]:
    x = make_data(l=11, bias = i)
    x, y= x[:,:9], x[:,-1]
    print(x,y)

tensor([[62.3799, 63.4259, 64.3857, 65.5097, 66.4785, 67.4576, 68.6136, 69.4785,
         70.4420]]) tensor([72.4677])
tensor([[-94.4864, -93.5647, -92.4024, -91.5976, -90.7541, -89.6858, -88.5592,
         -87.2961, -86.6319]]) tensor([-84.5226])
tensor([[540.6299, 541.7110, 542.7780, 543.7479, 544.9114, 545.7786, 546.7360,
         547.7943, 548.6898]]) tensor([550.7460])

This is a rather simple example but is I hope you understand the idea.
It does not really help to normalize over the dataset/batch. For example:

xb, yb = [], []
for i in [100, 200, 300]:
    x = make_data(l=11, bias = i)
    x, y= x[:,:9], x[:,-1]
    xb.append(x)
    yb.append(y)
xb, yb = torch.cat(xb), torch.cat(yb)
(xb-xb.mean())/xb.std(), (yb-xb.mean())/xb.std()

(tensor([[ 0.3367,  0.3413,  0.3461,  0.3512,  0.3544,  0.3603,  0.3642,  0.3699,
           0.3743],
         [ 0.9650,  0.9696,  0.9749,  0.9784,  0.9840,  0.9884,  0.9937,  0.9993,
           1.0025],
         [-1.3572, -1.3527, -1.3492, -1.3442, -1.3401, -1.3345, -1.3304, -1.3262,
          -1.3197]]), tensor([ 0.3828,  1.0130, -1.3115]))

It does help to do this per item: scale the input, run the model, and scale back to match the prediction. This really help the model to train better and there is no leakage since no y data is used to scale the prediction.

I used to do this this with a TfmdDL in create_item. For examle:

xb, yb = [], []
for i in [100, 200, 300]:
    x = make_data(l=11, bias = i)
    x, y= x[:,:9], x[:,-1]
    xb.append((x-x.mean())/x.std())
    yb.append((y-x.mean())/x.std())
xb, yb = torch.cat(xb), torch.cat(yb)
(xb-xb.mean())/xb.std(), (yb-xb.mean())/xb.std()

(tensor([[-1.4768, -1.1911, -0.7628, -0.3886,  0.0444,  0.3560,  0.7571,  1.1187,
           1.5430],
         [-1.5020, -1.2109, -0.7504, -0.3387,  0.0050,  0.4018,  0.7681,  1.1345,
           1.4926],
         [-1.4600, -1.1334, -0.7901, -0.3764, -0.0726,  0.3756,  0.7257,  1.1631,
           1.5681]]), tensor([2.2095, 2.3414, 2.2393]))

Only this would not work well for predictions (in production or kaggle) because it is not part of encode/decode process. This also means it does not show the data correctly (only normalized). So I do not really like that option.

I’m having some problem to implementing any other solution however. I first wanted to do this with a ItemTransform during after_batch only it assumes the last encoded is the one that needs to be decoded. Another option is to do the scaling inside the model only this gives other weird bugs.

My question is what is the best way to handle this? Any other options I’m missing?

Since I’m aware that this is a rather confusing problem, any suggestions to make my question more clear are very much welcome.

More details are in the gist below:

gist.github.com

https://gist.github.com/takotab/09b9b91550bf35fe8afe07c008afea86

normalize batch in model.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",

This file has been truncated. show original

sgugger · February 18, 2020, 2:57pm

Could you elaborate on that? As it’s the solution I would recommend for this use case.

takotab · February 19, 2020, 12:13pm

Thank you for your reaction.

There are places (show_batch, show_results, interpreter) where the encode is not followed by the same (size) decode.

This is in my experience. Now you cal me out on it, I’m starting to doubt. I’m struggling to make a clean example. To be continued.