Wikipedia Bi-directional model


In part 10 Jeremy mentioned an improvement on top of the Wikipedia model by using also the reversed language model. What’s the right way to combine the 2 models?
Maybe I’m wrong, but I can’t just use np.mean of 2 output vectors because I want to add more layers to it (classification for example).