Strange dimensions of encoder output

pstroe · May 13, 2020, 2:24pm

dear fastai community,

i’d like to use the encoded output of a qrnn text classifier. training the classifier was no problem and is more or less a copy from ULMFit:

Path: ., model=SequentialRNN(
  (0): MultiBatchEncoder(
    (module): AWD_LSTM(
      (encoder): Embedding(15000, 400, padding_idx=1)
      (encoder_dp): EmbeddingDropout(
        (emb): Embedding(15000, 400, padding_idx=1)
      )
      (rnns): ModuleList(
        (0): QRNN(
          (layers): ModuleList(
            (0): QRNNLayer(
              (linear): WeightDropout(
                (module): Linear(in_features=800, out_features=4650, bias=True)
              )
            )
          )
        )
        (1): QRNN(
          (layers): ModuleList(
            (0): QRNNLayer(
              (linear): WeightDropout(
                (module): Linear(in_features=1550, out_features=4650, bias=True)
              )
            )
          )
        )
        (2): QRNN(
          (layers): ModuleList(
            (0): QRNNLayer(
              (linear): WeightDropout(
                (module): Linear(in_features=1550, out_features=4650, bias=True)
              )
            )
          )
        )
        (3): QRNN(
          (layers): ModuleList(
            (0): QRNNLayer(
              (linear): WeightDropout(
                (module): Linear(in_features=1550, out_features=1200, bias=True)
              )
            )
          )
        )
      )
      (input_dp): RNNDropout()
      (hidden_dps): ModuleList(
        (0): RNNDropout()
        (1): RNNDropout()
        (2): RNNDropout()
        (3): RNNDropout()
      )
    )
  )
  (1): PoolingLinearClassifier(
    (layers): Sequential(
      (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (1): Dropout(p=0.12, inplace=False)
      (2): Linear(in_features=1200, out_features=50, bias=True)
      (3): ReLU(inplace=True)
      (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (5): Dropout(p=0.1, inplace=False)
      (6): Linear(in_features=50, out_features=309, bias=True)
    )
  )
), opt_func=functools.partial(<class 'torch.optim.adam.Adam'>, betas=(0.9, 0.99)), loss_func=FlattenedLoss of BCEWithLogitsLoss(), metrics=[<function accuracy_thresh at 0x7f65e8fbbf28>, <function precision at 0x7f652ac99d90>, <function recall at 0x7f652ac99d08>], true_wd=True, bn_wd=True, wd=0.01, train_bn=True, path=PosixPath('.'), model_dir='models', callback_fns=[functools.partial(<class 'fastai.basic_train.Recorder'>, add_time=True, silent=False)], callbacks=[RNNTrainer
learn: ...
alpha: 2.0
beta: 1.0], layer_groups=[Sequential(
  (0): Embedding(15000, 400, padding_idx=1)
  (1): EmbeddingDropout(
    (emb): Embedding(15000, 400, padding_idx=1)
  )
), Sequential(
  (0): QRNN(
    (layers): ModuleList(
      (0): QRNNLayer(
        (linear): WeightDropout(
          (module): Linear(in_features=800, out_features=4650, bias=True)
        )
      )
    )
  )
  (1): RNNDropout()
), Sequential(
  (0): QRNN(
    (layers): ModuleList(
      (0): QRNNLayer(
        (linear): WeightDropout(
          (module): Linear(in_features=1550, out_features=4650, bias=True)
        )
      )
    )
  )
  (1): RNNDropout()
), Sequential(
  (0): QRNN(
    (layers): ModuleList(
      (0): QRNNLayer(
        (linear): WeightDropout(
          (module): Linear(in_features=1550, out_features=4650, bias=True)
        )
      )
    )
  )
  (1): RNNDropout()
), Sequential(
  (0): QRNN(
    (layers): ModuleList(
      (0): QRNNLayer(
        (linear): WeightDropout(
          (module): Linear(in_features=1550, out_features=1200, bias=True)
        )
      )
    )
  )
    # (1): RNNDropout()
    #), Sequential( 
     # (0): PoolingLinearClassifier(
      #  (layers): Sequential(
       #   (0): BatchNorm1d(1200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        #  (1): Dropout(p=0.12, inplace=False)
        #  (2): Linear(in_features=1200, out_features=50, bias=True)
        #  (3): ReLU(inplace=True)
        #  (4): BatchNorm1d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        #  (5): Dropout(p=0.1, inplace=False)
        #  (6): Linear(in_features=50, out_features=309, bias=True)
        )
      )**
)], add_time=True, silent=False)

I want to use the encoded output up to the part I commented out. in other words, I’m not interested in the labels, but only in the output of the last qrnn layer.

according to this las layer, if I take a text xb and let it run through learn.model[0](xb), i expect a tensor of (1, 1200), since 1200 is the output size in the last qrnn layer. however, it gives me something like (1, 1384, 1550), and i frankly don’t know how a text input of size (1, 24834), so a 2-d tesnor, ends up as a 3-d tensor, and then of 1384 in one dimension (1200 would have made sense). does anybody have an explanation for this?

best wishes, phillip