This is a really interesting idea.
Something similar has been explored in DSSM.
The difference is how the embeddings are being generated.
However, there seems to some confusion regarding the terminologies which I would like an attempt at clarifying.
FastAI LM vs Classifier
To make this happen we’re neither gonna be using the language model nor the classifier.
I’m a little unclear if calling this an anutoencoder (my opinion is not) is correct so I’m not gonna comment on that.
In the fastai context:
First, what’s the difference between a language model and classifier?
Ans: Only the last set of layers.
If you try comparing the model (you can do that by comparing
The part that’s same is the
MultiBatchEncoder and the lm uses the
LinearDecoder and the classifier uses the
PoolingLinearClassifier. Now just for curiosity if you wanna understand what’s the essential difference between the two. Try looking at the last entry in the
(6): Linear(in_features=50, out_features=2, bias=True)
What this means is it’s taking in a vector of size 50 and giving an output of size two.
(decoder): Linear(in_features=400, out_features=60000, bias=True)
Taking input of size 400 and outputting a vector of size 60000.
This is where softmax is being applied. In the former, the output will either be True or False. In the latter, the output is the next word. One word out of a vocab of 60K.
Coming back to how I think this kind of thing can be implemented.
Inspiration from the
fastai.collab can be used here.
First, let’s check out the kind of data we could be dealing with.
For a set of queries and documents, there’s a score as to how relevant each document is to each query. A bit like from the
colab module example from the course, how much does a single user like the movies on a scale of 1-5.
Now, what is the model going to look like?
We just need the
MultiBatchEncoder. What this results in is we can feed a document or a query to the model and we get a vector of size 400 for each.
Now what will the forward pass look like:
We take a query document pair, we pass both through the model. Getting two vectors of size 400. Here, taking inspiration from the
colab module we can take a dot product of the vectors.
The dot product will give us one score.
Now given the actual and predicted score are normalized we want the predicted score to be closer to the actual score. The
colab module uses MSELoss which is the way to go in my opinion.
So what does the big picture look like (TL;DR)?
This is exactly like the
colab module. But we’ve replaced the randomly initilized embeddings in
colab with the
MultiBatchEncoder from the
text module. And of course we first train the language model on all of the documents.
It should be interesting work.
I don’t have enough time to implement this on my own. So if anyone’s looking to
collab-orate I’d like to.
As far as I understand the trickiest part is gonna be the
Working with fastai somehow the databunch is really the hardest part to get through.