I’m trying to solve a problem where I have a dataset of images of dimensions (224, 224, 2) and want to map them to a vector of 512 continuous values between 0 and 2 * pi. I’ve already created a dataset of 10,000 images and their corresponding vectors. I started experimenting with VGG-16. However, I have some concerns:
Images are sparse by nature, as they represent the presence (or not) of a particle in space. Each particle is annotated by an area of 5x5 pixels in the image. On channel 1, wherever there is a particle, the area of pixels is white, otherwise is black. On channel 2, wherever there is a particle the area of pixels goes from white to black, depending on how close or far the particles are from the observer (position in 3d). Everything else is black as before. My concern here is how a CNN like VGG-16 is going to behave on the sparsity of data. It would be great if you could provide me with any intuition on that, i.e. if it’s totally pointless to approach this problem like that or whatever.
For starting, I will be using torch.nn.MSELoss to minimize the error between predicted and actual 512 values for each image. Does it make sense? Do you have something else to suggest?
My 512 outputs are phases meaning the true targets are continuous values between 0 and 2 * pi. Is there any way to add something like an activation function that does the mod 2 * pi calculation so my prediction is always within that range, and is also differentiable? I know tanh is also an option, but that will tend to push most of values at the boundaries.