Estimating distribution parameters using RNNs


I’m working on a problem where I am trying to forecast unit sales of products (so integers) at different locations. So what I have is time a lot of time series for unit sales at each location for each product. I’ve found this great paper:, where they do exactly this, and I am trying to implement however i’m a bit confused about something fundamental paper which is:

Instead of predicting sequences of numbers, they use an RNN with LSTM cells to return sequences of ESTIMATES of mean and variance for Gaussian distributions, or mean and dispersion (for negative binomial distribution). I am particularly interested in the negative binomial case because I am also predicting count data.

Most of the ML tutorials and everything i’ve read ypically when teaching about loss functions in terms of predicting values, comparing that value with training target data and then adjusting network weights and biases to minimize the loss, which intuitively makes sense to me, however in this paper though they frame it in terms of estimating the parameters of a distribution for the following time point.

The way I understand it is that you would have inputs of sequences (time lagged time series), and you are predicting the non time lagged series, and then it gets fed into the LSTM cells and it returns a sequence with the length of whatever period you want to predict and width of 2 (mean and dispersion in the negative binomial case).

My question then is how to turn these estimates into counts in order to assess how accurate the model is on test data? My intuition is that if you have the mean and dispersion of the distribution that you could then calculate the mode of that distribution ( floor(mean * ((dispersion - 1) / dispersion)) and because it is a discrete distribution, you would have a series of integers which you could then calculate accuracy of.

Does anyone have experience with something similar to this? Modeling count data, estimating distribution parameters using neural networks?

Any input would be great! :slight_smile:

Hi Tim, I am also working on count time series data of vehicle demand. I have previously tried tscount library in R. I saw that the performance is only as good as using historical averages. I am currently trying to use LSTM. We model probabilities because we want to know confidence intervals of our predictions and because there are know clearly visible patterns in data in a autoregressive manner. So we prefer to aggregate the counts at the significant periods that show a very high or very low demand and learn the probability distributions in those periods in order to make probabilistic predictions instead of point wise predictions. Do you think that this is a correct way to approach this problem? I have generated my own data to try to solve this. you can check here.