Predict Problem of ULMFit with PyTorch 0.4

Hi, all

I am trying ULMFit these days. It works fine on fastai repo and the script. However, I am trying to build ULMFit on the latest version of PyTorch(0.4.0), the finetune and train-class script is fine, but got some problem on prediction.

I have 412 test case, the result of learner.predict_with_targs() on pytorch 0.4 is a tuple with the first one length 3296, it looks like this: [ 20 48 107 … 200 4 63]

the result on the exactly same code but run with pytorch 0.31 (which is embed in fastai repo) will return a 412 length ndarry, something like below, which is what I want
[[ 1.00275 -1.00087]
[ 1.25688 -1.56693]
[ 2.83312 -2.40475]
[ 0.38864 -0.38543]
[-4.41717 3.6363 ]
[ 3.51488 -3.07165]
[ 2.77479 -2.56887]
[ 2.07361 -1.77178]
[ 2.4589 -2.47992]
[ 2.2935 -2.06768]
[ 3.62962 -3.14912]

I have also tried learner.predict(is_test=True) but still got this wired predict result. Every thing is the same (including the code and model) except the pytorch version, so I am pretty sure it is something related to it. I have tried to dig into the source code of fastai, but didn’t anything useful… Any clues? Thanks!

The prediction code is as below:

# load vocabulary lookup
itos = pickle.load(open(join(PATH, 'tmp/itos.pkl'), 'rb'))
vs = len(itos)

# load data
test_data = np.load(join(PATH, "tmp/tst_ids.npy"))
test_data = np.squeeze(test_data)
test_lbls = np.load(join(PATH, "tmp/lbl_tst.npy"))
test_lbls = np.squeeze(test_lbls)
c=int(test_lbls.max())+1

# build a TextDataset
test_dataset = TextDataset(test_data, test_lbls)

# build a SortSampler
BATCH_SIZE = 4
test_samp = SortSampler(test_data, key=lambda x: len(test_data[x]))

# build a DataLoader
test_loader = DataLoader(test_dataset, BATCH_SIZE, transpose=True, num_workers=1, pad_idx=1, sampler=None, shuffle=False)
#ld a TextData instance

md = TextData(PATH, None, None, test_loader)

# build the classifier (exactly as it was in train_clas.py)
opt_fn = partial(optim.Adam, betas=(0.8, 0.99))
bptt = 70     # back propogation through time
em_sz = 400   # size of embeddings
nh = 1150     # size of hidden
nl = 3        # number of layers

dps = np.array([0.4,0.5,0.05,0.3,0.4])

model = get_rnn_classifer(
            bptt=bptt, 
            max_seq=20*70, 
            n_class=c, 
            n_tok=vs, 
            emb_sz=em_sz, 
            n_hid=nh, 
            n_layers=nl, 
            pad_token=1,
            layers=[
            em_sz*3, # three layers of 1200, but then where does nh=1150 come in?
            50,      # just like an intermediate compression layer?  Why 50?
            c        # number of total labels
            ],   
            drops=[dps[4], 0.1],
            dropouti=dps[0],
            wdrop=dps[1],
            dropoute=dps[2],
            dropouth=dps[3]
            )
model.eval   # just to make sure dropout is being applied

#ld an RNN_Learner
learner = RNN_Learner(
            data=md,
            models=TextModel(to_gpu(model)),     # not sure why this is required
            opt_fn=opt_fn
            )
learner.model.eval     # just to make sure dropout is being applied

loaded_weights = torch.load(join(PATH, "models/%s.h5" % model_name))
learner.load(model_name)
preds_dist, preds = learner.predict_with_targs()

I figure it is predicting the language model result, instead of the classification result. So I print out the model structure, it since fine. The out_freatures of the last layer is 2. So any other clues?

[(Embedding(12835, 400, padding_idx=1), LockedDropout(
)), (WeightDrop(
  (module): LSTM(400, 1150, dropout=0.3)
), LockedDropout(
)), (WeightDrop(
  (module): LSTM(1150, 1150, dropout=0.3)
), LockedDropout(
)), (WeightDrop(
  (module): LSTM(1150, 400, dropout=0.3)
), LockedDropout(
)), PoolingLinearClassifier(
  (layers): ModuleList(
    (0): LinearBlock(
      (lin): Linear(in_features=1200, out_features=50, bias=True)
      (drop): Dropout(p=0.4)
      (bn): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True)
    )
    (1): LinearBlock(
      (lin): Linear(in_features=50, out_features=2, bias=True)
      (drop): Dropout(p=0.1)
      (bn): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True)
    )
  )
)]

I had a problem on 0.4 as well, 0.4.1 seems to work, try upgrading

I tried just now, but it doesn’t seem to work on mine. :sob: still got a result of array with length 3296… and have no idea what it means

One more difference is that I am writing code on python2, instead of python3.

Sorry I thought fastai is built on python 3. Running on python 2 is counter intuitive. Also I note that 3296 is the 8th multiple of 412. You perhaps need to take Jeremy’s advice , split out lines of code into individual notebook cells and print out objects to isolate the problem. Also use of the debugger maybe helpful.

I know this seems a daunting task and very time consuming but this fact just highlights the amount of work Jeremy does to get these modules working.

Thanks for your replying, I have tried hard to edit code to adopt py2, and I succeed. You are right that 3296 is the 8th mulitple of 412. And turns out that every 8 integer (range bewteen 0 to 255) can be encode as 2 float (4 integer each), which is exactly the result we excepted.

For those who may also run into such problem:
I think this is something related to python2 and python3, instead of torch version. Theres is a built-in function bytes in py3, which can return a binary representation given a list of integer, however, bytes is just an alias of str in py2.

Below is the code to encode a list of 4 integer into a float I am using. Basically I am turning them into hex, then turning hex to float.

def l2f(x):
    return hex2f(l2hex(x))

def hex2(x):
    r = hex(x)[2:]
    if len(r) < 2:
        r = '0'* (2 - len(r)) + r
    return r

def l2hex(x):
    r = ''
    for i in x:
        r += hex2(i)
    return r

def hex2f(x):
    return struct.unpack("<f", x.decode('hex'))[0]

l2f([57, 214, 69, 190])
# -0.1932000070810318