Extract second to last layer for clustering

Hey there!

I most recently found an interesting post about the idea to use an intermitten layers output as input for clustering.
I know, that Embeddings can be used as such as well (NPL: Using fastai word embeddings to cluster unlabeled documents) but I would be interested in perform something similar to the idea shown here (https://github.com/botkop/mnist-embedding/blob/master/notebooks/mnist-embedding-autoencoder.ipynb).

The writer of the notebook flattened the MNIST dataset such that from the nxn matrix he gets an (n*n, 1) or a row in a table with an additional column, which is the target.
After that, using a tabular_learner, he fitted a network to the data. All this has been done with fastai-v1 (since there are still databunches), but that is not hard to translate to v2.

There are two parts where following this idea is kind of hard:
first, in field 30, his structure is flat, while a tabular_learner nowadays creates nested structures such as

Sequential(
(0): LinBnDrop(
(0): Linear(in_features=114, out_features=200, bias=False)
(1): ReLU(inplace=True)
(2): BatchNorm1d(200, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): Dropout(p=0.1, inplace=False)
)
(1): LinBnDrop(
(0): Linear(in_features=200, out_features=100, bias=False)
(1): ReLU(inplace=True)
(2): BatchNorm1d(100, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): Dropout(p=0.2, inplace=False)
)
(2): LinBnDrop(
(0): Linear(in_features=100, out_features=1, bias=True)
)
(3): SigmoidRange(low=0, high=0.8)
)

A way around is to use learner.model._modules or flatten_model, so not the biggest problem, I think (even though it took some time to find these).

But, whenever I attempt to get to something akin to [45], learner.get_preds, the system tells me that

Using a target size (torch.Size([64])) that is different to the input size (torch.Size([6400])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size

RuntimeError: The size of tensor a (6400) must match the size of tensor b (64) at non-singleton dimension 0

And this is where I am a bit stumped.

To recap:

  • I am using other data, thus my input is slightly different
  • I am using the standard layers ([200, 100]) in a tabular_learner
  • I trained it with batch_size 64, that worked fine
  • I got the last two (2, 3) layers cut off as well as the ReLU-Dropout part of the second layer (1).

Expected behaviour is: put data in, run through the existing parts, spit out 100 features per row (1.0, out_features=100);
Observed behaviour appears to be: learner expected to give out a batch sized tensor, but cannot deal with the current result (which would be a 64x100 tensor, I hope).

So I can either perform learner.forward(…) for all batches or someone would not mind helping me :wink:
Anything is appreciated!

Cheers!

Just if anyone should come upon this one for information:

yes, with flatten_model cutting the embeddings, if there are any, and the Sigmoid as well as the last Linear layer works.
You can then continue as in the original post pointed to in the link (second one).

Still testing out whether this is useful for my data, but if one wants to have some fancy clustering, there you go.

1 Like