I don’t know for sure, but process_doc is intended for a single string because of its use of the one_item method. If speed isn’t a constraint in your use case, you could just do a list comprehension like: [encode_doc(doc) for doc in my_dataset]
I’m sure there’s a much more efficient way to do it if you can get all of your docs into one batch and pass that through the AWD LSTM - you’d have to change encode_doc to return the entire batch rather than the first item (in my case I was assuming the batch had a single item).
Not sure if someone is still struggling with this; I have been and couldn’t find what I was looking for on the forums.
This is what I’ve come up with and it made processing a fairly large dataset pretty painless. (<3 minutes when sending data in batches vs ~2hours send one document at a time).
I load my classification model, then add the data I want scored as a test set: