Classifying every pixel in an image as 0 or 1 vs multi-object detection

I am trying to detect swimming pools on satellite imagery. There can be zero to multiple pools on one image and I want to know were the pools are on the image (not only if or if not there are pools).

So far I have had decent results using manual rules on the value of red, green, blue of the image. However I think deep learning can improve the results.

I have been looking into multi-object detection (very interesting project using fastai), but the approach seems complex. Do you think an approach were the output layer is 0 for every pixel without pool and 1 foor every pixel with a pool in it, would make sense? Or would multi-object detection be a preferred approach?

ps: I do have to admit creating the pixelwise training dataset will be tedious