I’ve noticed in some comparative experiments with learn.predict() and learn.get_preds(), the results in preds is less accurate but learn.predict() will always have the same outcome. I’m not 100% sure on where to start looking as to why that’s an issue, but I wanted to bring it up in conversation as it’s something drastic for anyone that wants to do dataset predictions on a large dataset where learn.predict() just isn’t fast enough. Here is a notebook showing this:
You can see when I call learn.get_preds() on a test dataset, the results were always 0, or <50k. But when I do learn.predict(), they’re not!
Hopefully we can find a solution to this. @sgugger?
learn.get_preds() returns predictions,targets, which, in the case of the test set, is just an array of zeros. You want to use preds.argmax(dim=-1) to get the actual predicted classes.
The results from learn.validate I am seeing that volatility. Just running it five times I see accuracy swings from 83.75% down to 83.56%. Whereas predict() is always the same. I can show the notebook illustrating this in a moment. I understand that generation of the test predictions should be done in the test set, but if I want to grade them too I worry about this.
The training dataloader is always shuffled and with drop_last=True. You should make the test set a validation set in your data_test to have consistent results.
I’m attempting to do that now, (I used sklearn’s train_test_split so I reset the index’s). If I pass in the following:
.split_by_idx(test.index.to_list())
I get an error of index 0 is out of bounds for axis 0 with size 0.
So then I tried starting at 1, eg:
lis = test.index.to_list()[1:]
Which does have all but one, so one should go into the training set, and I received an index error in index_row(a, idxs)
IndexError: too many indices for array
The broken workaround to get it work is I have to do the following:
.split_by_idx(list(range(1, len(test)-1)))
Which is what we don’t want as we still drop the first and last.
Although I have used ordered=True I have the same problem with get_preds(). I am very new in fastai and I appreciate any help.
I made a language model and then: