I was wondering what if my target of prediction is not a binary class, multiclass or continuous value but a matrix.

For example, predicting the RGB color matrix from Black White picture, or predicting next frame of image from previous frame.

I am actually working on a time series sales data, I have an idea to encode the data into an image. My data is a time series sales data. The method that i can think of now is as below.

let x and y axis represent time and product of a matrix, and then the elements within the matrix are values of the quantity sold so that the matrix can be viewed as a channel of an image as the pixel in the image share same value in the matrix.

using method above I convert the morning data into Training Image and the afternoon data into the Target image. after that I apply convolution and pooling on the training to extract feature. the features learnt then will be reshaped into dense 1d vector. the vector then transformed into model outputs through fully connected layer.

I am confused in the last part, what kind of activation function should I use in for the model output so that I can compare with my ground truth (in this case, the afternoon matrix), one way I can think of is flatten the target afternoon matrix into 1d vector, so I can just use linear function in the model output. The cost function can just be the usual MSE.

I know that RNN would be better for this problem since this is a sequence data. but I wonder if I try to make the time series forecasting problem becomes a normal image classification problem, can I utilize the success of CNN in image to this problem (since lot of hyper-parameter of CNN have amazing breakthrough) .

Thank you.