Wikipedia Bi-directional model

In part 10 Jeremy mentioned an improvement on top of the Wikipedia model by using also the reversed language model. What’s the right way to combine the 2 models?
Maybe I’m wrong, but I can’t just use np.mean of 2 output vectors because I want to add more layers to it (classification for example).

Thanks!

You would probably want to concatenate the output vectors and let the model treat each output independently. Using a mean is like hard-coding the weight of each output vector to 0.5.