Loss on hierarchical categories


I’m working with a dataset that has hierarchical categories. The target label looks something like: category_level_1, category_level_2… There can be up to 4 levels but some examples have just one level. Each category_level_1 has a different number of distinct level_2 categories. Concretely the target label structure is a tree. For example, I could have: “Desserts”, “Desserts / Cakes”, “Desserts / Chocolate”, “Salad”, “Salad / Fruit”, “Salad / Pasta”, “Salad / Potato”.

What loss function could I use on this data?
On the level_1 category, a simple softmax cross entropy would be fine but beyond that I’m a bit confused because each example of my minibatch could have a different size because category_level_2 size may vary.