Ulmfit multi label weird behaviour

Hi everyone, i’m using Ulmfit for a multilabel classification tasks on a Dataset of my own. My code is based on the great imdb_scripts which i modified to accommodate my needs.

The goal of my project is not to have the best “accuracy” on the whole dataset, but to have very high accuracy on some subset of the dataset. I have an initial set of category, and a trash category which is used to regroup data without any category. The dataset is almost multiclass, by which i mean there are few elements with more than 1 label. My first approach to test Ulmfit with minimal effort was to get back to a multiclass problem. I considered group of labels of labels that were in large quantity and added them as new label and added them to the label pool, and put everything else in the trash category. (For example if label 1 and label 2 at the same time made 10% of the dataset, i add label1-2 as a new label, but label 2 and label 3 at the same time make less than 1% of the data set, every element with label 2 and 3 is classified in the trash category).

I then trained and managed to get a 75% accuracy on the multiclass problem(and around 90-95% accuracy in each category taken separately).

My biggest category which was like 20% of the dataset had one of the lowest accuracy with around 80% accuracy.

I decided to try another approach in a 1 vs all fashion. I made an array of size the number of category and put a 1 for each label of the element. I then trained a classifier for each category in the hope to have better performance. At my great surprise, the performance was very bad. For instance, on my biggest category, i could only reach 60% accuracy in this fashion, while i could get 85% by taking the result from the multiclass approach and having it restricted to that class. I tried tweakings the parameters, but i couldn’t get much better result.

So the weird behaviour is the following : while i get good accuracy on one of my category by training on multiclass, i get much worse accuracy by training only on this category. Any thoughts on this?