I am trying to implement MCNN for crowd counting based on this paper
I can find a pytorch version of implementation online, but want to use fastai as an exercise.
The uniqueness of this work is that, the output from the MCNN network is a density map, not a regular “label”. I am not sure how I should create the data bunch and specify where the labels are.
I am thinking several options below:

create a databunch without specifying labels information (is this allowed?), and create customized loss function. (both input image and corresponding density map need to have same transformation. not sure how this can be done)

write customized itemlist which handles the transformation for both input image and density map like what the segmentation example does. I tried to model this problem as a regression problem. Specifically, generate labels by using label_from_function (just sum of the density map as a number  crowd count). If doing this way, I probably need to modify the architecture by adding a node at the end (the count number)?

Using Pytorch instead to prepare data, and to create loss function. Only use fastai to create a Learner, then using fastai to train the model. Is it even possible?  only using the Learner part of fastai?
I am not sure which direction to go (or not sure whether they are valid solutions) in terms of learning curve. I don’t know how to write customized itemlist yet. I am thinking to go with the 1st option or 3rd option.
I would like to hear your thoughts. Thanks!
Yan