Lesson 3 In-Class Discussion ✅

Hello,

I’m getting a syntax error when I try to run the following in windows:

! mkdir -p ~/.kaggle/

! mv kaggle.json ~/.kaggle/

For Windows, uncomment these two commands

! mkdir %userprofile%.kaggle

! move kaggle.json %userprofile%.kaggle

Any suggestions please?

Cheers.

During Segmentation Learning topic, I didn’t understand why Jeremy used acc_camvid as accuracy metric instead of accuracy. Any suggestion?

1 Like

Hi all,

I am having trouble with visualizing where the Kaggle data for the planet Amazon exercise exists on my instance using Salamander. From the previous example, I can navigate using Jupyter Notebooks to see the images of bears/teddy bears. But I cannot see the same for the planets example. What am I missing?

Thanks,

Luke

Hi, I’m trying to run the planets notebook, and I’m getting a memory error when trying to find the learning rate. I tried reducing the batch size to 32 by running data = (src.transform(tfms, size=256).databunch(bs=32).normalize(imagenet_stats)), but that didn’t work, so I also tried batch sizes of 16, 4, 2, and 1, always getting the same memory error. I’ve searched the forum and found others on this thread having the same problem, but changing the batch size seemed to work for them.

Am I changing the batch size incorrectly, or is there something else I should try?

Thanks for your insight!

If train loss > valid loss = underfit and train loss < valid loss = overfit is true,
does this mean a good model should have similar levels of training loss and validation loss?

Another thing that bothers me is the way he uses the learning rate. In some videos, he uses two different numbers for the same type of learning plot. Lesson(1) - slice(1e-3, 1e-5) and Lesson(2) - slice(3e-3, 3e-5).

Hope someone answer this.

Hi, @ghubs. I found this post on learning rates very helpful: Determining when you are overfitting, underfitting, or just right?.

Regarding the notation for the learning rate: are you familiar with scientific notation? If not, Khan Academy is a fantastic free resource, with a good series on the topic: https://www.khanacademy.org/math/pre-algebra/pre-algebra-exponents-radicals#pre-algebra-scientific-notation

If you’re already familiar with scientific notation, you might be used to seeing it notated as something times 10 to a power, and this notation using e is actually equivalent. The ‘e’ represents “10 to the power of,” not the number e! https://en.wikipedia.org/wiki/Scientific_notation#E-notation

So 1e-3 == 1 * 10^(-3) == 0.001. If the number in the decimal place needs to be something other than 1, you’ll use that number. For instance, if you needed to express 0.05 in this notation, you’d have 5e-2, i.e. “5 times 10 to the power of -2”.

I hope this helps!

2 Likes

That is very helpful. Thank you Laura :slight_smile: Also the links are amazing. Now I understand that choosing the learning rate is not only dependent on the LR plot but also to do with the intuition and experience gained from running many different models and datasets.

1 Like

can we use slice(1e-5, 1e-3) as explained in lesson1 instead of 3e-02?

Hi :wave:

I had a quick question regarding accuracy (in a regular classifier not multi classifier). Jeremy said that basically what happens is we use argmax to find the index/label of the predicted class and compare it to the actual and then we take a mean.

What I can’t figure out is where does a mean come from? What are we taking a mean of? ie if we have a predict value with an index of 4 for example and an actual with index of 5 when we compare those (4==5) how can we take a mean on that? :sweat_smile:

I feel like it has to do something that these might be vector/matrices but I was hoping someone could clarify this up a bit. :slight_smile:

Where are you running your Jupiter notebook at?

I think windows commands assume you are running your notebook locally on your windows machine or windows VM and in case that you are doing that my best guess would be to run last two commands and not first two ones.

In case that you are running your notebook on google cloud and similar platforms just run first two regardless of if you have a windows/Mac/linux machine since that’s running in the cloud. Hopefully that helps!

Hi, @novarac23! I’m new at this, too, so this might not be correct, but reading the docs for the accuracy function:

def accuracy(input:Tensor, targs:Tensor)->Rank0Tensor:
    "Computes accuracy with `targs` when `input` is bs * n_classes."
    n = targs.shape[0]
    input = input.argmax(dim=-1).view(n,-1)
    targs = targs.view(n,-1)
    return (input==targs).float().mean()

It looks like the == operator is returning not True or False but instead a tensor of numeric values, which are then converted to floats, and then it takes the mean of those floats? I’m not sure what it’s doing to generate the numeric values instead of the boolean values we’d expect from ==, so hopefully someone else will chime in, too.

It’s a Boolean. So we take if they’re the same, it’s 1. If not. It’s 0. Then we sum up all those correct and incorrect predictions, and take the average. This gets us accuracy. Total amount of 1’s / total

2 Likes

Ah, that makes sense! Thanks. :grinning:

Sorry to post this again, but I’m hoping someone will have an idea about what I might try to fix this. I’ve since found that if I restart the kernel and set the batch size to 1 (bs=1), I get a different error:

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 4096])

So that makes sense, and it shows that I’m successfully changing the batch size, but changing the batch size does not resolve the memory error.

Here’s my original post:

I’d deeply appreciate any ideas!

@go_go_gadget thank you for the answer!

@muellerzr so accuracy is total_amount_of_correct_predictions/total_amount_of_data_points?

I’m a bit confused by the part where you said we sum up correct and incorrect predictions. Thought it was only the correct ones :thinking:

1 Like

That’s at the bottom. Yes :slight_smile: your logic is right. Think of it the same as how well you did in an exam :slight_smile:

That makes sense! Thank you for answering! :raised_hands:

No problem :slight_smile: Glad I could help!

1 Like

Just to clarify (sorry I sometimes can’t let go of these things haha) by at the bottom you mean total_amount_of_data_points?

total_amount_of_data_points = correct + incorrect predictions

Sorry to be annoying with questions :grimacing:

Yup! :smile: on the nose! :slight_smile: and not annoying at all :slight_smile:

2 Likes