Lesson 7 official topic

Hey! Has anyone been able to do Multi-Target analysis on Titanic dataset?

I tried but wasn’t able to do it. Would love some help!

1 Like

I’m struggling to fit the collaborative filtering embeddings example from this lesson into my understanding for how a deep neural net works. I’m sure my thinking / mental model is wrong somewhere and it would be really helpful if someone could point out where I’m making a mistake.

When I think about forward propagation in a neural net I think about this equation: output = activation_function(weight * input + bias). What I can’t figure out in the collaborative filtering embedding example is whether the embeddings (users & movies) are the weights or the inputs.

My initial reaction is that they are the weights since we are using SGD to find them. But if they are the weights then what is the input and where is the input being multiplied by the weights (as per the forward function I outlined above)? All I see is the dot product of users and movies but those are the weights and I was expecting something like y = wx + b.

Would really appreciate it if anyone can help clarify

1 Like

Hey,

The embedding layer has weights. The inputs to the model / to that layer are the user/movie ids. You can think of each id as a one-hot-encoded vector which gets multiplied by the embeddings weights matrix.

3 Likes

Oh that makes a lot of sense - thanks Ben! And does this model only have one hidden layer?

Hi all. I was trying to understand how does the Google paper mentioned in chapter 9 of the book implement the dot product model (mentioned in chapter 8 of the book) as part of their “wide network”.

They pass features through a “cross product transformation”, and then take a dot product with a vector of learnable weights. The cross product transformation seems to be multiplying two or more binary categorical features together. How is this process same as our “dot product” approach for collaborative filtering? This is confusing to me because we don’t multiply two or more binary variables together anywhere in our dot product approach.

Any help in understanding would be super useful. Thanks!

1 Like

Help!

Working my way through the Kaggle NoteBook Scaling Up: Road to the Top, Part 3 and running into a few issues. Looks like convnext_small_in22k no longer included within the timm library, which is to be expected given my late arrival to the party!

Tried something that looked similar convnext_small.fb_in22k but get this error:

Checked official docs and ran !pip install huggingface-hub but keep getting the above error. Any help would be much appreciated :slight_smile:

I have a question.

For a visual multilabel classification task I am trying to implement a Multitarget-Model based on Lesson7,Multi-target: Road to the Top, Part 4.
In section ‘Replicating the disease model’, Cell 8, Jeremy proposes a way to explicitly pass a loss- and errorfunction to the learner.

def disease_err(inp,disease,variety): return error_rate(inp,disease)
def disease_loss(inp,disease,variety): return F.cross_entropy(inp,disease)

Now this works fine for me if I use a metric like accuracy_multi. But once I try to pass a metric like F1ScoreMulti from the fastai.metrics package like that, finetuning the learner runs into errors.

What I do know is that for a standard visual singletarget-multilabel-classification task I need to instantiate a metric like F1ScoreMulti before I can pass it to the learner like

f1_macro = F1ScoreMulti( ).

But with the multitarget-approach proposed in Lesson7,Multi-target: Road to the Top, Part 4, Cell8 this approach does not work.

Any suggestions what I do wrong?

I’m not familiar with F1ScoreMulti but just noticing differences two your function and the code in cells 15,16,17

  • “all_metrics” is a tuple, so try: metrics=(f1Score_Multi)
  • Cell 15 passes a subset of the input input[:,:10] to error_rate()

The first few hits in the following search also look promising…
F1ScoreMulti classification metrics can’t handle a mix of multi-label-indicator and continuous-multioutput targets

Note that that’s not a tuple either! You need a comma: metrics=(f1Score_Multi,)

2 Likes

The links for “Label Smoothing Explained using Microsoft Excel” from the forum post and course website are outdated, here’s the updated link:

1 Like

thanks. I’ve updated the link at the top of this post now.

anyone tried Multi-Target classification with Titanic dataset. I am not able to correctly pass in the customized loss function trying to predict Survived and Pclass at the same time. Any notebooks where this has been tried and worked?

I got the same error. Issue was that newer timm library was getting installed that didn’t seem to be compatible with the notebook. Setting the timm library to the exact version fixed the issue for me (though session has to be restarted for it to work, just rerunning this code won’t overwrite the library with the older version)

Change:
path = setup_comp(comp, install=‘fastai “timm>=0.6.2.dev0”’)

path = setup_comp(comp, install=‘fastai “timm==0.6.2.dev0”’)

2 Likes

Great job fixing this, I was struggling with it this morning and wondering why Kaggle wasn’t playing nice!

@jeremy is it worth updating the Kaggle noetbooks for the course?

Hi there,

Im running the notebook in Kaggle for Road to the Top, Part 3 and running into an issue when running the ‘swinv2_large_window12_192_22k’ model.

train(‘swinv2_large_window12_192_22k’, 192, epochs=1, accum=2, finetune=False)
report_gpu()

Getting the below error after running this cell:

:RuntimeError: running_mean should contain 12 elements not 3072

Any help would be much appreciated?

Many thanks,
Ross

I am having the same error right now. I installed wwf following some advice from Github but it didn’t function. What should I do?

Hi guys,
I’m in lesson 7, and finally i’m encountering an error. I’m in Scaling up Part 3 notebook. and can’t get the memory usage with code Jeremy provided.
I’m running WSL2 in mamba venv

My error is with print(torch.cuda.list_gpu_processes())

TypeError Traceback (most recent call last)
Cell In[43], line 1
----> 1 print(torch.cuda.list_gpu_processes())

File ~/mambaforge/envs/fastaienv/lib/python3.10/site-packages/torch/cuda/memory.py:598, in list_gpu_processes(device)
596 lines.append(“no processes are running”)
597 for p in procs:
→ 598 mem = p.usedGpuMemory / (1024 * 1024)
599 lines.append(f"process {p.pid:>10d} uses {mem:>12.3f} MB GPU memory")
600 return “\n”.join(lines)

TypeError: unsupported operand type(s) for /: ‘NoneType’ and ‘int’

When i check torch.cuda.is_available() True
and torch.cuda.current_device() 0
Any ideas?

When i modify code it runs fine, except i see no efficiency gains with gradient accumulation.
So i’m wondering if something is off with my setup or something is not calculating correctly?

Here’s the modified code that runs.
def report_gpu_updated():
print(f’Total GPU Memory: {torch.cuda.get_device_properties(0).total_memory / (10242):.2f} MB’)
print(f’Used GPU Memory: {torch.cuda.memory_allocated(0) / (1024
2):.2f} MB’)
print(f’Cached GPU Memory: {torch.cuda.memory_reserved(0) / (1024**2):.2f} MB’)
gc.collect()
torch.cuda.empty_cache()
report_gpu_updated()

convnext_small_in22k
accum = 1 Used GPU Memory: 16.25 MB
accum = 2 Used GPU Memory: 814.71 MB
accum =4: Used GPU Memory: 800.23 MB

for vit_large_patch16_224 example with accum =2
Used GPU Memory: 4654.58 MB
Does this look right?

Thanks in advance

I also got an error for last 2 models:

swinv2_large_window12_192_22k
RuntimeError: running_mean should contain 12 elements not 3072

and
swin_large_patch4_window7_224
RuntimeError: running_mean should contain 14 elements not 3072

Did you figure out what was wrong?

@jeremy Why did you choose 5e-3 as the learning rate in fit_one_cycle() in the “Collaborative filtering deep dive” notebook? This number seems magical to me. Can you help me understand how you arrived at it?

Another question I have about the “Collaborative filtering deep dive” notebook is:
Why does the collaborative filtering model built with fastai’s collab_learner outperform the deep learning model built with the CollabNN class? It seems to perform even worse when I add the extra hidden layers as such

learn = collab_learner(dls, use_nn=True, y_range=(0, 5.5), layers=[100,50])
learn.fit_one_cycle(5, 5e-3, wd=0.1)