Custom Pixel transform

I have made a custom transform that places random white rectangles over an image and places text inside them. I want to make the model learn to remove the text, so the output would need to be compared against the image + the white rectangles without the text.

What would be the best way of doing this? I could pre process the images and generate them before training (once for each epoch I am planning to train), but was wondering if there was a way to keep it as a transform applied on the fly on training.