Working with related multilabel prediction

I have two somewhat related questions.

    1. I am working with multilabel classification, but my data falls within exclusive categories, so I want to limit the possible predictions to always predict the same number of categories
    1. Also, some of my data is related. What I mean by that is that some labels ought only occur, when other labels occur. The default approach to this of course is to let the learner figure out the relation, but what to do when the learner is not getting the right idea?

As a simple example for the first case: let’s say I have photos where there is always one fruit and one vegetable present in the picture. I then need to have labels for all kinds of fruit and all kinds of vegetables in order for the model to predict the correct fruit and the correct vegetable, but obviously the model should never predict two fruits or two vegetables, since this will never occur. So I somehow want to tell the model, that the outcome falls within two categories, but I want to stick with multilabel since both categories have to be present.

And for the second queston: let’s say that a cup of coffee can only be present in the picture if there is already a fruit present. How to do this? Again of course maybe the model will figure it out from the data, but what if it doesn’t.

Since these are situations that are so regulated, the model really shouldn’t waste energy on figuring out, I assume there must be a way to help the model predict based on a couple of rules.

I think I have an idea on how to solve this. I could do what one could call “multi-multiclass”, that is: multiple classifications of each image where each classification classifies within each own group of labels, so that the image ends up having multiple labels but not two labels within the same exclusive class. This means I would have to actually build several models that each predict between the the most likely outcome within their group.

Multi-Label Classification to find genre based on plot summary. the prediction of multiple labels simultaneously associated with a single instance.