How to get_preds for text classification

kuroyuli · July 19, 2021, 9:35am

I’m trying to do transfer learning using ‘Readnet’ on github.

I successfully trained the model by changing,

Replacing the original train.csv with my own train.csv, in def get_dls(bs):
Downloading and specifying the glove embeddings (glove.6B.100d.txt), in class GloveEmbedding(nn.Module): and class GloveTokenizer:
Changing dimensions from d_model=200 to d_model=100, in def get_model():
Specifying batch size, in learn = Learner(dls=get_dls(32), model=get_model(), loss_func=MSELossFlat())

Please teach me how to get_preds for my texts in the test.csv.

test_df = pd.read_csv('test.csv')
learn.dls.test_dl(test_df, 32)
learn.get_preds()

just game me (None, None)

meanpenguin · July 20, 2021, 5:26pm

Maybe try …

dl = learn.dls.test_dl(test_df, 32)
probs,_ = learn.get_preds(dl=dl)

kuroyuli · July 20, 2021, 11:02pm

Thank you very much for your reply, meanpenguin.
I changed the code according to your suggestion.

test_df = pd.read_csv('test.csv')
test_txts = test_df.text.tolist()
test_cut_txts = prepare_txts_cut(test_txts, tokenizer)
test_dl = learn.dls.test_dl(test_cut_txts, 32)
preds,_  = learn.get_preds(dl=test_dl)

Then, I got an error.

in forward(self, x, feats_sent, feats_doc)
20 feats_doc = feats_doc.cuda()
21 x = self.embed(x)
—> 22 b, d, s, m = x.shape
23 x = x.reshape(b * d, s, m)
24 sents_enc = self.sent_block(x, feats_sent.reshape(b * d, -1)) # (b*d, m)

ValueError: not enough values to unpack (expected 4, got 3)

Please help me to solve this error.

kuroyuli · July 25, 2021, 11:32pm

I solved this problem by myself.

the successful code is

test_df = pd.read_csv('test.csv')
test_txts = test_df.excerpt.tolist()
test_cut_txts = prepare_txts_cut(test_txts, tokenizer)
test_cut_txts_zip = zip(test_cut_txts, [0, 0, 0, 0, 0, 0, 0]) #test_df has 7 samples
test_dl = learn.dls.test_dl(test_cut_txts_zip, 32)
preds,_  = learn.get_preds(dl=test_dl)