Hi everyone, I’ve been out of the whole ml scene for a few years now and am trying to get back in, which is harder than I though with all the changes between fastaiv1 and v2. I flew through the new course and checked out the new notebooks, but I can’t quite seem to find what I need.

My current project is about extracting data from a text publication. The texts can be in various formats and are therefore hard to handle via regex etc. I thought maybe it might be possible to extract these contents with ML.

For now I managed to choose a single text block as a label and train a simple classifier, which quickly got to 98% accuracy, which is already better than the regex solution. But that was the easy case with a single target which only had 4 possible outcomes. Overall I would have around 20 labels/datapoints, some of which are not finite. Is it possible to address this kind of task with ml? If yes how would I approach it, multiple models for every label or is it possible to have one model for all labels? And how would extraction for non finite labels like names work?

tl;dr

Can I classify text to multiple targets? If yes how and how do I handle non-finite labels?

Thanks for the help!