Variable length strings in structured dataset

Hi,

I am trying to model a neural net that classifies personal expenses, based on expense date, amount and a descriptive text. In this descriptive text I have variable length strings, like “mortgage payment”, “phone bill”, “dinner with friends from university”.
How could I handle this variable length string field? I would like to tokenize the contents, normalize and sort them, in order to have something like a variable list of tags.
But then I have no idea of how to model these tags, given that every dataset entry would have a variable number of tags.

Do you have any advice on how to handle this kind of data?

Thanks in advance
Tommaso

1 Like