I have been studying RNNs as I’m trying to tackle a related time series problem at work. I found the paper that @jeremy introduced https://arxiv.org/pdf/1508.00021.pdf to be very interesting, however I am confused and getting stuck on one key concept. My confusion is with this statement in the paper:
Furthermore, in order to help the network to better identify short-term dependencies (such as the velocity of the taxi as the difference between two successive data points), we also considered a variant in which the input of the RNN is not anymore a single GPS point but a window of 5 successive GPS points of the prefix. The window shifts along the prefix by one point at each RNN time step.
It turns out that this window-shifted variant of RNN performs incredibly well on this problem.
I thought that the hidden states inside the RNN would render feeding a “window of observations” obsolete. Isn’t one of the fundamental abilities of an RNN the ability to find interactions between each time step? Its confusing to me why the RNN isn’t able to learn these interactions and why feeding a window of time on each time step works better.
Where is my understanding of RNN gone wrong here? any help from anyone would be greatly appreciated.
Right, but are you saying an RNN cannot capture interactions between consecutive time steps if you do not feed in windows of time? My question is more about how RNNs work and if my understanding is correct.
@jeremy Towards the end of Lecture 14, you mention that one of the interns is working on reproducing the solution for the Taxi Destination Prediction problem. I took part in this Kaggle competition a year ago, now I want to implement the soluton by Bengo’s team. Is there any chance the implementation is available to look at?