The use-case you described here above falls under the time series (multipoint) forecasting. The case treated in Rossmann is a regression: It is a kind of a single point forecasting. There are many deep learning model used in time series forecasting: some are listed here.
You don’t need to load all your data in the RAM at once, and you can mix continuous data with categorical data (also called covariate variables such as day-of-the-day, hour-of-day, promo-dates, etc.)
You can use some lazy loading techniques to only load the chunks that you need to build your batch, and train and update your model per batch. As it is illustrated, here below, the model only need to have access to 2 small windows: 1) context window also called lookback window (green rectangle), and 2) prediction window also called forecast window (cyan color)
**zi,t **: is the curve that we would like to forecast (i.e. energy demand, sales, etc). The forecast starts at the end of the time serie.
xi,1,t and ui,2,t: feature time series or co-variate variables (respectively categorical and continuous data)
ui,2,t: is represent the day of the week in this example. It is a categorical data in this case, an embedding is used when we train a given model.