I’m wondering if it’s possible to build a model that, for example, takes an input corresponding to a higher scale and reads it through an RNN, and pauses to read in a sequence through another RNN for values on a finer scale associated with that the state of the higher scale. Let me elaborate.
sequence_1 = [s1_1, s1_2, …, s1_p]
sequence_2 = [ [s2_1_1, s2_1_2, … s2_1_q], [s2_2_1, … s2_2_q], …, [s2_p_1, …, s2_p_q] ]
If I were to break it into separate models down I would calculate the output of a model on sequence_1, feed it into sequence_2, train on sequence_2 for that ‘batch’, and then I would want to train the model_1 based on the error of the model_2 associated with the input of the model_1. Will I have to calculate the gradient to the input that is the output of model_1 and use that gradient and the error of model_2 to calculate the error of the model_1, or is there a way to make this all operate as one model?