I’m interested in using LanguageLearner for extracting features from text - for example for one text I can use
def get_model_outputs(learner, text):
input_tensor, __ = learner.data.one_item(text)
return learner.model[0](input_tensor)
How can I encode several texts at once?
The problem with using one_item
is that for text I can’t just go and concatenate several results, because they will have different lengths.
Should I use other dataset_type, as in one_item
's code? Or should I do the padding before I feed the values to one_item
?
To encode several texts as one, you should put them in a test dataset then loop through learn.data.test_dl
That sounds confusing. Does that mean I have to override learner’s dataset?
Could you by any chance point documentation site that might be helpful for doing that?
I got to this
def encode_texts(texts, vocab):
df = pd.DataFrame({'text': texts + texts}) # fastai breaks when I tried to use validation size 0...
df.to_csv('/tmp/df.csv')
lm_data_bunch = fastai.text.TextLMDataBunch.from_csv('/tmp', 'df.csv', valid_pct=0.5, val_bs=len(texts), vocab=learn.data.vocab)
return lm_data_bunch.one_batch('Test')[0][:len(texts)]
But it seems like results are indeterministic, and I don’t see a way to disable shuffling…