Lesson 1 In-Class Discussion ✅

Hello,I am using Colab for Lesson 1. I am trying to upload a dataset from google drive and I am facing issue in doing so. I thought of uploading on git as well but my dataset is more than 100mb.
If anyone has used Colab for assignment,please share your approach.
In my case data is not getting loaded in linux machine,therefore the {PATH} commands are not working.

hello, I have uploaded my data on google compute engine vm instance but when i try to access the data from jupyter lab it is giving me an error PermissionError: [Errno 13] Permission denied: 'path'. How can I give jupyter notebook fullaccess to my vm instance?

Almost similar. Creating a class index to label mapping as well, which might come in handy in many places and leveraging that for getting the class wise frequencies

i2cmapping = {}
i2cmapping.update(zip(data.train_ds.class2idx.values(), data.train_ds.class2idx.keys()))
pd.Series([i2cmapping[i] for i in data.train_ds.y]).value_counts()
1 Like

Nice, didn’t know about class2idx. Then we can make it even simpler (and because of not using python loops like in your example, it is much faster):

Classes and numbers of training examples per class:

pd.Series(data.class2idx).map(pd.Series(data.train_ds.y).value_counts())

of course you could still put the

.sort_values(ascending=False)

on the end of the line to get a sorted list.
Have updated my reply above

1 Like

@lesscomfortable the javascript which you have added for downloading the url file for the images is not working .
urls = Array.from(document.querySelectorAll(’.rg_di .rg_meta’)).map(el=>JSON.parse(el.textContent).ou);
window.open(‘data:text/csv;charset=utf-8,’ + escape(urls.join(’\n’)));

Hello, when collecting my own data set, let’s say collecting images of insects, if images in a training set have paper labels (e.g. species and size indicator of the insect), would you recommend to crop this area out, especially images in a test set are not likely to have these paper labels?

I had the similar issue - had to change the level of access of my files by typing a bash command on a terminal:
chmod 777 /path/to/your_file_name

This would give read/write/execute access to your file.

Thanks, the access control is not a problem now but when i run
data = ImageDataBunch.from_folder(path)
I am getting this
ValueError: num_samples should be a positive integeral value, but got num_samples=0 .

Listing a few doubts that i got while working on different experiments. Any help is appreciated.

  • The data and learner pipeline is breaking we are using batch size = 1. Though higher batch sizes are preferred, the pipeline is supposed to work for batch size of 1 as well ryt?
    Error while trying to run learner.fit

    Show_batch error with batchsize =1
  • recorder.plot_losses starts plotting validation loss after 1 epoch. Any specific reason for this?
    image
  • Is there a way to get reproducible results using random.seed() in someway for fastai models?
    Currently way of defining np.random.seed() doesnt give reproducible results!! Tried a few things from these links as well but couldnt get reproducible results.
  • Is there a way we can get back the file names of specific images for e.g.that are wrongly classified or those having top losses?
    Since i was working with images generated from audio in an example, plotting the images as such dint help much but checking those audio clips if something is wrong in annotation or too background noise etc. would have helped
2 Likes

These are the layers appended to the resnet34 model after chopping off some of its end layers.

You can get all of its code by just going into the ConvLearner source code.

??ConvLearner
??create_head
??bn_drop_lin

Diving into just these 3 functions will give you a decent idea of what’s happening.
Even if you don’t get precisely what’s happening in each line of the code, you can simply refer to the docstring to get a big picture of what’s happening in that function.

1 Like

Hi, are you getting a specific error message?
In my case, I disabled my adblocker, ran the code in the console and it downloaded the urls as csv.

3 Likes

Not quite - if you go back and listen to the lesson at this point, you’ll here that the first one was simply to show that a poor learning rate choice means I needed to find a better one.

1 Like

We’re not looking for the lowest error rate, but the strongest negative slope. So a bit less that 1e-5.

4 Likes

But MaxPool layers don’t retain the relative positions {what I’ve read on internet, I’m not sure though}. This is addressed in Capsule Networks. These networks work more like inverse graphics. This video covers the core content of the paper. This is a very recent area of research.

@jeremy what are your thoughts on Capsule Networks? Are they really better than ConvNets and is it worth learning them?

My understanding was when we unfreeze and fit we are trying to see if we have choosen good pretrained model or not and with this if we can go ahead and fine tune further to get the most out of this model.
Thanks for correcting me @jeremy but now I wonder is there any guideline for the model selection for eg. we have resnet34, 50, 101 etc how to choose one and when.

@sgugger any comments on this?

1 Like

There’s actually a thread for that:

5 Likes

@YJP I’m having the same issue. Were you able to resolve this?

@rameshsingh - I still see the same error even after avoiding the extension while using untar_data function.
path = untar_data(‘http://www.cs.utoronto.ca/~kriz/cifar-10-python’); path

Also, I am getting the below error on trying with a different dataset

Any help would be apperciated :slight_smile: