This is a really interesting idea.

Something similar has been explored in DSSM.

The difference is how the embeddings are being generated.

However, there seems to some confusion regarding the terminologies which I would like an attempt at clarifying.

**FastAI LM vs Classifier**

To make this happen we’re neither gonna be using the language model nor the classifier.

I’m a little unclear if calling this an anutoencoder (my opinion is not) is correct so I’m not gonna comment on that.

In the fastai context:

First, what’s the difference between a language model and classifier?

Ans: Only the last set of layers.

If you try comparing the model (you can do that by comparing `learn_lm.model`

and `learn.model`

)

The part that’s same is the `MultiBatchEncoder`

and the lm uses the `LinearDecoder`

and the classifier uses the `PoolingLinearClassifier`

. Now just for curiosity if you wanna understand what’s the essential difference between the two. Try looking at the last entry in the `LinearDecoder`

and `PoolingLinearClassifier`

.

`PoolingLinearClassifier`

:

`(6): Linear(in_features=50, out_features=2, bias=True)`

What this means is it’s taking in a vector of size 50 and giving an output of size two.

`LinearDecoder`

:

`(decoder): Linear(in_features=400, out_features=60000, bias=True)`

Taking input of size 400 and outputting a vector of size 60000.

This is where softmax is being applied. In the former, the output will either be True or False. In the latter, the output is the next word. One word out of a vocab of 60K.

**Coming back**

Coming back to how I think this kind of thing can be implemented.

Inspiration from the `fastai.collab`

can be used here.

**First, let’s check out the kind of data we could be dealing with.**

For a set of queries and documents, there’s a score as to how relevant each document is to each query. A bit like from the `colab`

module example from the course, how much does a single user like the movies on a scale of 1-5.

**Now, what is the model going to look like?**

We just need the `MultiBatchEncoder`

. What this results in is we can feed a document or a query to the model and we get a vector of size 400 for each.

**Now what will the forward pass look like:**

We take a query document pair, we pass both through the model. Getting two vectors of size 400. Here, taking inspiration from the `colab`

module we can take a dot product of the vectors.

**The Loss**

The dot product will give us one score.

Now given the actual and predicted score are normalized we want the predicted score to be closer to the actual score. The `colab`

module uses MSELoss which is the way to go in my opinion.

**So what does the big picture look like (TL;DR)?**

This is exactly like the `colab`

module. But we’ve replaced the randomly initilized embeddings in `colab`

with the `MultiBatchEncoder`

from the `text`

module. And of course we first train the language model on all of the documents.

**Conclusion**

It should be interesting work.

I don’t have enough time to implement this on my own. So if anyone’s looking to `collab`

-orate I’d like to.

As far as I understand the trickiest part is gonna be the `DataBunch`

.

Working with fastai somehow the databunch is really the hardest part to get through.