Train unconditioned then fine-tune for conditioning?

I have a situation where I need to get good output quality from a latent diffusion model, but where I expect the data available for conditioning to be highly variable. My concern is that the standard conditioned base model approach might be a bit problematic if the fine-tuning datasets don’t (or can’t) provide the detail/depth of the original training. By that I just mean that the “base” model could be trained with fairly long, detailed prompts, but future fine-tuning datasets might provide much less information.

What I’m wondering is whether it would make sense to train an unconditioned base model first (or perhaps a very generic or “lightly” conditioned base model—e.g., maybe based on class labels only), then add conditioning during fine-tuning? Does that make any sense? The intuition I’m following is that perhaps the base model can learn a “smoother” space when trained without conditioning (which might be seen as a kind of bias), on top of which conditioning might provide guidance later on, on a case-by-case basis.

Any thoughts?