Near the end of this notebook, in the section heading ‘Test’, we have this line:

f = Model([inp_story, inp_q], match)
shortly thereafter, we call predict


This is somewhat interesting – how can we predict on a model that hasn’t been fit?

Looking at the model f, I noticed that the fitted weights are the same as the fitted weights in the answer model. Cleary we are sharing weights. But how / where / when was this sharing specified? It seems too magical. Thanks.

The functional API lets you share weights like that by calling the same layer more than once.

@mattobrien415 in Keras model can be seen as a wrapper around layers (in fact, models themselves are layers). As a result, what matters are the actual layers, not the model itself.

When building the earlier model “asnwer”, from layer inp_story/inp_q to layer match were all connected. The new model “f” is simply reusing the existing information, i.e. all the layers, tensors, connections etc. Also because the previous model “answer” was trained with all these layers (and more), the weights in those layers already exist. The new model “f” is simply a wrapper around the existing layers, thus with all weights retained.

If you look back in lesson 9, this is exactly how in Imagenet style transfer exercise, we reused the VGG model by building a new model using VGG’s style/feature layers as output, while keeping all weights trained by VGG.


Thanks. I’ve been starting to get the feeling for this happening intuitively. Network architecture is pretty interesting stuff.

So what would be the difference between these two architectures? If any?