This is a really interesting idea.
Something similar has been explored in DSSM.
The difference is how the embeddings are being generated.
However, there seems to some confusion regarding the terminologies which I would like an attempt at clarifying.
FastAI LM vs Classifier
To make this happen we’re neither gonna be using the language model nor the classifier.
I’m a little unclear if calling this an anutoencoder (my opinion is not) is correct so I’m not gonna comment on that.
In the fastai context:
First, what’s the difference between a language model and classifier?
Ans: Only the last set of layers.
If you try comparing the model (you can do that by comparing learn_lm.model
and learn.model
)
The part that’s same is the MultiBatchEncoder
and the lm uses the LinearDecoder
and the classifier uses the PoolingLinearClassifier
. Now just for curiosity if you wanna understand what’s the essential difference between the two. Try looking at the last entry in the LinearDecoder
and PoolingLinearClassifier
.
PoolingLinearClassifier
:
(6): Linear(in_features=50, out_features=2, bias=True)
What this means is it’s taking in a vector of size 50 and giving an output of size two.
LinearDecoder
:
(decoder): Linear(in_features=400, out_features=60000, bias=True)
Taking input of size 400 and outputting a vector of size 60000.
This is where softmax is being applied. In the former, the output will either be True or False. In the latter, the output is the next word. One word out of a vocab of 60K.
Coming back
Coming back to how I think this kind of thing can be implemented.
Inspiration from the fastai.collab
can be used here.
First, let’s check out the kind of data we could be dealing with.
For a set of queries and documents, there’s a score as to how relevant each document is to each query. A bit like from the colab
module example from the course, how much does a single user like the movies on a scale of 1-5.
Now, what is the model going to look like?
We just need the MultiBatchEncoder
. What this results in is we can feed a document or a query to the model and we get a vector of size 400 for each.
Now what will the forward pass look like:
We take a query document pair, we pass both through the model. Getting two vectors of size 400. Here, taking inspiration from the colab
module we can take a dot product of the vectors.
The Loss
The dot product will give us one score.
Now given the actual and predicted score are normalized we want the predicted score to be closer to the actual score. The colab
module uses MSELoss which is the way to go in my opinion.
So what does the big picture look like (TL;DR)?
This is exactly like the colab
module. But we’ve replaced the randomly initilized embeddings in colab
with the MultiBatchEncoder
from the text
module. And of course we first train the language model on all of the documents.
Conclusion
It should be interesting work.
I don’t have enough time to implement this on my own. So if anyone’s looking to collab
-orate I’d like to.
As far as I understand the trickiest part is gonna be the DataBunch
.
Working with fastai somehow the databunch is really the hardest part to get through.