Parameter Prediction Networks in Keras?

(Tait Larson) #1

I have been reading up a little on parameter prediction networks (networks that predict the weight matrices for other networks) in this paper and this paper. A very simplified example of this would include two networks:

  • a base network with one weight matrix containing a very large number of parameters (call it W).
  • an auxiliary network which outputs a matrix that is the same shape as W

The output from the auxiliary network replaces the weight matrix in the base network. Both networks are trained at the same time. In Keras this would look like one network, potentially with multiple inputs and outputs.

Just for fun, I wanted to see if I could write up a parameter prediction network in Keras. Assuming I train the base network and auxiliary network at the same time, I wasn’t sure how to tie them together. More specifically, I can grab the output tensor of the auxiliary network. I need to take that tensor and use it as the weight matrix in a Keras layer. I looked around at Keras and while it appears possible to set the initial values of a weight matrix it doesn’t appear possible to wrap a layer around an existing tensor and use it as the layer’s weight matrix. I’m also unsure if the shape of the output tensor will be acceptable for the shape of the tensor for the weight matrix in the used in the base network.

Assuming end to end training, is it possible to grab the output tensor from one layer or model in Keras and use it as a weight matrix in another layer? Assuming I can somehow load a tensor into a layer will I run into tensor shape issues? Any help with this would be much appreciated.

(Craig) #2

If you’ve made any progress doing this, I’d be really interested also!

(Tait Larson) #3

@CraigG sadly I haven’t picked this one back up.

I did find two repos related to the Diet Networks paper on github.

My first read through the gokceneraslan/dietnet repo didn’t quite seem like they’d implemented the shared weight matrix in a way that really reduced the number of params but I have to admit I read this rather quickly.

In the other repo, the code by the author of the Diet Networks paper felt more like the implementation described in HyperNetworks paper than the simplified architecture she described in her own paper.

Hope this helps!

(Tait Larson) #4

I spent a little more time on this and have a simple auxiliary network and base network that need review and testing.

These are modeled after the networks described in the Diet Networks paper.

#auxiliary network
inp = Input(batch_shape=(big, small), name="Xt_input") #use batch shape here? 
emb = Embedding(vocab_size, embed_size)(inp)
rs  = Reshape((small * embed_size,))(emb)
out = Dense(h1_units)(rs)
mdl = Model(inp, out)

#base network
in2 = Input(batch_shape=(small, big), name="X_input")
h1  = Lambda(doDot, output_shape=(small, h1_units))([in2, mdl.output])
out = Dense(10, activation='softmax')(h1) 
md2 = Model([inp, in2], out)

#helper function for lambda layer
def doDot(matrices):
  X, Y = matrices

These models compile but I haven’t had a chance to run them yet.

Some notes:

  • I don’t know how you’d train this in any sort of mini-batch fashion.
  • I wasn’t totally confident in how I handled reshaping the Embedding layer outputs in the auxiliary network. I’d love feedback.
  • Obviously, I used a Lambda layer to attempt to use the output of the auxiliary network, We, as a weight matrix in the base network. I believe this works but again, I’d love a second opinion.
  • If you do full batch learning (which the current code does) you likely don’t need two different inputs. I believe you can just reuse and transpose the original input in doDot.