As a little side project, I’ve been experimenting with the efficacy of an RNN to predict if a team will win a game given a sequence of the team’s prior games and some other features about the team and the opponent (e.g. offensive efficiency, defensive efficiency, strength-of-schedule, etc.). I’ve started simple and have organized the data such that the RNN only sees a sequence of five games at a time. Each game has a response variable determining whether the team won the game or lost the game. So, my response variable is actually of dimension [number of observation in dataset x 5]. The idea of using an RNN for this type of data is that it would have the ability to learn something akin to a team’s momentum. It would require some clever feature engineering for other algorithms that work on structured data to learn something like team momentum, so I was hoping an RNN would shine in this endeavor.

I’ve tried a simple RNN architecture, a GRU, and a multi-layer LSTM on this challenge. The models are doing ok, I suppose, but I would imagine that even doing some light feature engineering and then chucking the engineered dataset into a tree-based algo like gradient-boosted trees or random forest would outperform the RNN. I believe this because the RNN can barely beat a very simple benchmark that I have created.

I’ve been trying to diagnose why the models are not working well on this type of data and the first clue, which has left me a bit bewildered, is that if I compute the classification error (or cross-entropy) for each of the games in the five-game sequences, the error increases from the first game to the last game uniformly. That is, the classification error for the first game in all of the five-game sequences is lower than that for the second game, the second game is lower than that for the third, etc. This implies that the hidden state that is joined with each game’s input state is actually corrupting the information contained in the input state instead of augmenting it, right?

With that as background, I’m looking for suggestions on what additional tests I could run to help determine if an RNN could be useful for learning from this type of data. I’m sort of stuck at the moment and I’m thinking about abandoning the RNN algorithm, but the major reason I took up this project was to get more exposure to RNN’s and learn a bit about the fastai and pytorch in the process…so I’d like to keep trying.

Thanks in advance for reading and any comments you have.

Cheers!