Lesson 5 Advanced Discussion ✅

That’s pretty cool. Do you know of anyone using this in practice?

Thank you @jeremy for the excellent lecture. I have been thinking much about collaborative filtering and I have a few questions to ask.

  1. In the example given, the item list in the collaborative filtering dataset does not necessarily need to be movies. It can be anything with a rating to it. Potentially one can put demographic or other user data in there (e.g. if the person is male, then the entry male can be made in the movie column with a rating of 5 to capture this) and the model can learn how these data can influence the user embedding. This may avoid requiring a separate tabular model to use demographic data for the cold start problem. I wonder if anyone has tried this approach?

  2. The more interesting idea is to use collaborative filtering in medical diagnostics. Most patients only have a few diseases, and most diseases only affect a small proportion of patients, so this is not dissimilar to the movie recommendation problem. Potentially, if there is a dataset of a large number of patients with their diagnostic coding, then a collaborative filtering system can figure out if what diseases a patient may be most susceptible to given their previous diagnoses.

  3. Collaborative filtering can also be used to impute missing values. This is again of importance in medicine as most patients will not have had all the tests / investigations that is available. Potentially as more information is known about a patient, a system can impute and predict test results that have not been performed.

Does anyone know if a collaborative filtering approach has been used in medical research / diagnostics?

Hi! I’m trying to implement a net as discussed in this lesson, but I’m getting the following error:

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-30-36d1490bd9f3> in <module>()
----> 1 losses = [update(x,y,lr) for x,y in data.train_dl]

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
   1404     if input.dim() == 2 and bias is not None:
   1405         # fused op is marginally faster
-> 1406         ret = torch.addmm(bias, input, weight.t())
   1407     else:
   1408         output = input.matmul(weight.t())

RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #4 'mat1'

I have checked out my datatypes and everything that I can find is a float, there are no doubles, so I’m confused as to what could be causing this error. Thoughts?

If it helps, I’ve figured out (via source diving), that mat1 here is the bias.

I’ve figured out my problem, when I run:

torch.tensor(some_array)

I get back a DoubleTensor when what I want is a FloatTensor. What is also strange is that if you print that tensor you will see its dtype is float64. Anyways, I’ve gotten around this by calling torch.tensor(some_array).float().

Sorry for the late reply.
Those should be imported from the utils.py (check out this file in the same repo):

from utils import *

BTW. the post for that notebook that I have referred with the link in the 2nd update:

I am uploading link to my hand-written notes on things you can do to improve your neural networks. I wrote these notes while doing deeplearning.ai specialization taught by Andrew Ng. Hope you find these useful. These notes cover in detail topics such as regularization, weight decay, adam optimization , momentum etc. These notes go well with the lesson 5 where Jeremy teaches about Adam optimizing algo ,weight decays,RMSprop etc.

https://drive.google.com/open?id=1dXjZ2boL5pqvxSB7J-zLEdCbTo7XL8MO

2 Likes

THIS.
Jeremy skims through most concepts direct to the objective truths accepted by today’s standards. Although Adam was substantially RMSprop and momentum, It was difficult trying to grasp and mental picture/intuition on how each one specifically works. Thus, although supplementary, I recommend everyone to read through(ie: googling) anything remotely blurry because there are underlying concepts best explored in texts and publications rather than classrooms. :grin:

In lesson 5 Jeremy advises us to write our own adam optimizer. Here is how I implemented the adam

#Intialising matrices for momentum and rms

mom = {}
rms = {}
i = 0
for p in model.parameters():
mom[i] = torch.zeros(p.shape)
rms[i] = torch.zeros(p.shape)
i+=1

def update(x,y,lr,wd = 0.03, beta1 = 0.9,beta2 = 0.999,epsilon = 1e-08):
y_hat = model(x)
w2 = 0.
for p in model.parameters(): w2 += (p**2).sum()
loss = loss_func(y_hat,y) + wdw2
loss.backward()
i = 0
with torch.no_grad():
for p in model.parameters():
mom[i] = beta1
mom[i] + (1-beta1) * p.grad
rms[i] = beta2*rms[i] + (1-beta2) * (p.grad **2)
p.sub_(lr * (mom[i]/((rms[i] + epsilon)**0.5)))
p.grad.zero_()
i += 1
return loss.item()

Hi @a_bhimany_u, I could not get your solution to work, please see below. As far as I understood, the problem is that mom[i] is a CPU variable while the model has been loaded to CUDA.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-37-36d1490bd9f3> in <module>
----> 1 losses = [update(x,y,lr) for x,y in data.train_dl]

<ipython-input-37-36d1490bd9f3> in <listcomp>(.0)
----> 1 losses = [update(x,y,lr) for x,y in data.train_dl]

<ipython-input-36-5e0bf8c84769> in update(x, y, lr, wd, beta1, beta2, epsilon)
     18     with torch.no_grad():
     19         for p in model.parameters():
---> 20             mom[i] = beta1*mom[i] + (1-beta1)*p.grad
     21             rms[i] = beta2*rms[i] + (1-beta2)*(p.grad2)
     22             p.sub_(lr * (mom[i]/((rms[i] + epsilon)**0.5)))

RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor

Could you please share your working solution?
Thanks

hi @gabrielfior, I also got the same type of error while running the code. To troubleshoot this, I changed my runtime type from GPU to CPU in colab and initialised my model object as model = MyModel() instead of model = MyModel().cuda()

Hi @hwasiti,
Any update on how to do it?
I’ve downloaded the weights of a resnet50 pretrained on places365 (http://places2.csail.mit.edu/models_places365/resnet50_places365.pth.tar) and tried to follow Jeremy’s advice and i get this: t

. Any ideas of how to solve?

Thanks in advance

This really helped! Literally just made an account to thank you :smile:

hi all,
thanks for excellent lecture. I’m a Swift developer and as a practice and to understand it better I try to implement things presented during the lecture in Swift for TensorFlow. This time I implemented MNIST SGD with weight decay. However, when I plot the loss alongside with the implementation without weight decay they look almost the same. Here is my implementation: https://gist.github.com/jkrukowski/1b40ef7fd3c12cd9c70fa44477644f48
I’d be grateful if anyone can verify if my implementation is correct, thanks!

Neat !!

I have the same question. Have you find the solution?
Thanks.

Hi,
One observation in the last version of update() (see below) the Adam optimiser is initialised for every mini-batch. But Adam is a stateful optimiser, so by resetting it on every mini-batch, we are reducing significantly its performance. I moved line opt = optim.Adam(model.parameters(), lr) outside update() and achieved much faster convergence.

def update(x,y,lr):
    opt = optim.Adam(model.parameters(), lr)
    y_hat = model(x)
    loss = loss_func(y_hat, y)
    loss.backward()
    opt.step()
    opt.zero_grad()
    return loss.item()

from https://nbviewer.jupyter.org/github/fastai/course-v3/blob/master/nbs/dl1/lesson5-sgd-mnist.ipynb

That’s a good question !
When I implemented my home-cooked Adam optimizer class I naturally initialized it outside of the loop but did not compare it to initial Adam, I just went on implementing nn.Linear.
And when looking at the source code of Pytorch’s Adam I don’t see anything special that would justify it.
https://pytorch.org/docs/stable/_modules/torch/optim/adam.html#Adam
So it might be a mistake…

I finished the part where Jeremy talks about embeddings and it is really awesome. I googled some more information about it and seems easier for me to summarize embeddings as this:

  1. We can reduce the dimensionality of a categorical input vector (e.g. user and movie ids) by using embeddings.
  2. Similar categories are going to be close in the embedding space. I think that’s what Jeremy meant when he said that the dot product between User A and Movie B is going to be a high number if the user likes it and the movie is good. We could also see this on the example of german supermarkets.

Anyone know why when doing transfer learning the last layer is replaced by two layers? (instead of just a single layer with the correct dimensions for the new number of classes)

1 Like