The models and initializers should all be exactly the same, should they not? As I do not use any of the v2 library when it comes to initializing/generating the model. I instead used the original repository’s implementation
(I know then bare minimum I can scratch off the forward being an issue )
x that, I outputted them to text documents. Minor slight variations. For example:
v1:
[[ 3.7118e-01, 2.4901e-01, 3.4924e-01],
[ 6.1605e-02, -1.8064e-01, -5.7317e-01],
[-4.0040e-02, 1.2741e-01, -1.8215e-01]]
v2:
[[-2.9317e-01, -5.0387e-01, -2.1530e-01],
[-3.7322e-01, -2.0616e-02, 6.7522e-02],
[ 1.2084e-01, 2.0487e-01, -2.5438e-01]]
These are all somewhat close to zero though so can we count them as the same?
If so, the initialized layers are all exactly the same (found via scrolling a very large text document. For those wanting to go through the same exercise, here is what I did:
learn = Learner(...)
with open("params.txt", "w") as text_file:
params = list(learn.model.parameters())
for item in params:
text_file.write("%s\n" % item)
Now for directly addressing your list:
- Init - Verified all are the same
- Conv
- Linear
- BN
- For each of the above: bias and weights
- Whether bias is on or off for each layer - Biases are off on both models until last layer
- The implementation of
forward()
since that doesn’t appear in the model output - Both are same architecture, so same forward
- (I may have missed things too)
Next bit I am going to do is test out the optimizers (just to make sure). I’m going to baseline adam in v1 and v2