Migrate ULMFiT weights trained using fastai 0.7 to fastai 1.0

nozdi · January 16, 2019, 3:50pm

Hey there,

I’m a proud fastai user and I really love what you are doing for the community.

I have implemented a few text models using fastai 0.7 - ULMFiT.
As fastai 1.0 seems to be designed much better I’d like to migrate from 0.7 to 1.0.
Code wise it’s pretty straightforward.

I have problem with migrating the weights that I’ve trained in the past.
See below to understand the issue.

Old language had model definition:

    SequentialRNN(
      (0): RNN_Encoder(
        (encoder): Embedding(30002, 400, padding_idx=1)
        (encoder_with_dropout): EmbeddingDropout(
          (embed): Embedding(30002, 400, padding_idx=1)
        )
        (rnns): ModuleList(
          (0): WeightDrop(
            (module): LSTM(400, 1150)
          )
          (1): WeightDrop(
            (module): LSTM(1150, 1150)
          )
          (2): WeightDrop(
            (module): LSTM(1150, 400)
          )
        )
        (dropouti): LockedDropout()
        (dropouths): ModuleList(
          (0): LockedDropout()
          (1): LockedDropout()
          (2): LockedDropout()
        )
      )
      (1): LinearDecoder(
        (decoder): Linear(in_features=400, out_features=30002, bias=False)
        (dropout): LockedDropout()
      )
    )

While the new version:

SequentialRNN(
  (0): RNNCore(
    (encoder): Embedding(30002, 400, padding_idx=1)
    (encoder_dp): EmbeddingDropout(
      (emb): Embedding(30002, 400, padding_idx=1)
    )
    (rnns): ModuleList(
      (0): WeightDropout(
        (module): LSTM(400, 1150, batch_first=True)
      )
      (1): WeightDropout(
        (module): LSTM(1150, 1150, batch_first=True)
      )
      (2): WeightDropout(
        (module): LSTM(1150, 400, batch_first=True)
      )
    )
    (input_dp): RNNDropout()
    (hidden_dps): ModuleList(
      (0): RNNDropout()
      (1): RNNDropout()
      (2): RNNDropout()
    )
  )
  (1): LinearDecoder(
    (decoder): Linear(in_features=400, out_features=30002, bias=True)
    (output_dp): RNNDropout()
  )
)

I’ve noticed differences in WeightDropout as it requires one more matrix - '0.rnns.0.module.weight_hh_l0' - which is 0.rnns.0.weight_hh_l0_raw after dropout. Also I’m not sure if LockedDropout and RNNDropout is the same thing.

My main focus here is to migrate language models as I’ve trained them on specific sources and it took days…

If it’s possible I’d like to migrate also classifiers, but I guess it’s not.
Old model’s head:

1): PoolingLinearClassifier(
    (layers): ModuleList(
      (0): LinearBlock(
        (lin): Linear(in_features=1200, out_features=50, bias=True)
        (drop): Dropout(p=0.48)
        (bn): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): LinearBlock(
        (lin): Linear(in_features=50, out_features=6, bias=True)
        (drop): Dropout(p=0.1)
        (bn): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )

New model’s head:

  (1): PoolingLinearClassifier(
    (layers): Sequential(
      (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (1): Dropout(p=0.4)
      (2): Linear(in_features=1200, out_features=50, bias=True)
      (3): ReLU(inplace)
      (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): Dropout(p=0.1)
      (6): Linear(in_features=50, out_features=3, bias=True)
    )

Here the order of the BN and Linear layer is different - so the weights should be also different.

I’m curious if any of you have a script to migrate language model?

All best,
Mateusz

sgugger · January 16, 2019, 4:21pm

I remember someone doing a script but I can’t find it. For both, you just have to map the old names of the weights to the new ones. Note that:

in language models, there is a bias in the decoder in fastai v1 that you probably won’t have
in the classifier, the order you see for the layers is artificial (it’s the pytorch representation that takes the things in the order you put them in __init__ when not using Sequential) but the two models (old and new) apply batchnorm, dropout and linear in the same order
tokenizing is done differently in fastai v1, so you may have to fine-tune your models again (we add an xxmaj token for words beginning with a capital for instance)
for weight dropout, you want the weights you have put both in '0.rnns.0.module.weight_hh_l0' and 0.rnns.0.weight_hh_l0_raw (the second one is copied to the first with dropout applied anyway)