Multi image input to a Databunch

I’m trying to re-implement this machine implemented in Keras

it implements this 2018 Nvidia paper regarding infilling images

in the original paper, the machine is learning to convert masked images to unmasked images.

however to calculate loss, they used the predition from the model (an image) the original (another image) and the mask (a third image) the fastai custom loss functions however only accept the original image and the prediction.

I wanted to fix this by adding the mask to the input of the machine. I think this might also improve the results.

my question is how do I create a unet_learner with a databunch from two images to one.