Lesson 1 In-Class Discussion ✅

marcmuc · October 29, 2018, 9:50am

Nice, didn’t know about class2idx. Then we can make it even simpler (and because of not using python loops like in your example, it is much faster):

Classes and numbers of training examples per class:

pd.Series(data.class2idx).map(pd.Series(data.train_ds.y).value_counts())

of course you could still put the

.sort_values(ascending=False)

on the end of the line to get a sorted list.
Have updated my reply above

Kaushikjais · October 29, 2018, 10:43am

@lesscomfortable the javascript which you have added for downloading the url file for the images is not working .
urls = Array.from(document.querySelectorAll(’.rg_di .rg_meta’)).map(el=>JSON.parse(el.textContent).ou);
window.open(‘data:text/csv;charset=utf-8,’ + escape(urls.join(’\n’)));

YJP · October 29, 2018, 11:06am

Hello, when collecting my own data set, let’s say collecting images of insects, if images in a training set have paper labels (e.g. species and size indicator of the insect), would you recommend to crop this area out, especially images in a test set are not likely to have these paper labels?

YJP · October 29, 2018, 11:11am

I had the similar issue - had to change the level of access of my files by typing a bash command on a terminal:
chmod 777 /path/to/your_file_name

This would give read/write/execute access to your file.

Rinzin · October 29, 2018, 12:04pm

Thanks, the access control is not a problem now but when i run
data = ImageDataBunch.from_folder(path)
I am getting this
ValueError: num_samples should be a positive integeral value, but got num_samples=0 .

raghavab1992 · October 29, 2018, 12:09pm

Listing a few doubts that i got while working on different experiments. Any help is appreciated.

The data and learner pipeline is breaking we are using batch size = 1. Though higher batch sizes are preferred, the pipeline is supposed to work for batch size of 1 as well ryt?
Error while trying to run learner.fit

image.png1466×238 45.8 KB

Show_batch error with batchsize =1

image.png834×254 28.4 KB
recorder.plot_losses starts plotting validation loss after 1 epoch. Any specific reason for this?
Is there a way to get reproducible results using random.seed() in someway for fastai models?
Currently way of defining np.random.seed() doesnt give reproducible results!! Tried a few things from these links as well but couldnt get reproducible results.
Is there a way we can get back the file names of specific images for e.g.that are wrongly classified or those having top losses?
Since i was working with images generated from audio in an example, plotting the images as such dint help much but checking those audio clips if something is wrong in annotation or too background noise etc. would have helped

asutosh97 · October 29, 2018, 12:47pm

These are the layers appended to the resnet34 model after chopping off some of its end layers.

You can get all of its code by just going into the ConvLearner source code.

??ConvLearner

??create_head

??bn_drop_lin

Diving into just these 3 functions will give you a decent idea of what’s happening.
Even if you don’t get precisely what’s happening in each line of the code, you can simply refer to the docstring to get a big picture of what’s happening in that function.

SOVIETIC-BOSS88 · October 29, 2018, 1:10pm

Hi, are you getting a specific error message?
In my case, I disabled my adblocker, ran the code in the console and it downloaded the urls as csv.

jeremy · October 29, 2018, 1:17pm

Not quite - if you go back and listen to the lesson at this point, you’ll here that the first one was simply to show that a poor learning rate choice means I needed to find a better one.

jeremy · October 29, 2018, 1:18pm

We’re not looking for the lowest error rate, but the strongest negative slope. So a bit less that 1e-5.

asutosh97 · October 29, 2018, 1:26pm

But MaxPool layers don’t retain the relative positions {what I’ve read on internet, I’m not sure though}. This is addressed in Capsule Networks. These networks work more like inverse graphics. This video covers the core content of the paper. This is a very recent area of research.

@jeremy what are your thoughts on Capsule Networks? Are they really better than ConvNets and is it worth learning them?

rameshsingh · October 29, 2018, 1:40pm

My understanding was when we unfreeze and fit we are trying to see if we have choosen good pretrained model or not and with this if we can go ahead and fine tune further to get the most out of this model.
Thanks for correcting me @jeremy but now I wonder is there any guideline for the model selection for eg. we have resnet34, 50, 101 etc how to choose one and when.

asutosh97 · October 29, 2018, 2:14pm

@sgugger any comments on this?

hiromi · October 29, 2018, 2:31pm

There’s actually a thread for that:

prratek · October 29, 2018, 3:49pm

@YJP I’m having the same issue. Were you able to resolve this?

Tejaswani · October 29, 2018, 4:09pm

@rameshsingh - I still see the same error even after avoiding the extension while using untar_data function.
path = untar_data(‘http://www.cs.utoronto.ca/~kriz/cifar-10-python’); path

Tejaswani · October 29, 2018, 4:24pm

Also, I am getting the below error on trying with a different dataset

Any help would be apperciated

pnvijay · October 29, 2018, 4:33pm

I have been digging inside the fastai library to find out where the loss function is defined for the learner in Lesson 1. The optimizer is by default “Adam” and is defined in the class Learner which is inherited by ConvLearner. But I am not able to find where the loss function which happens to be F.cross_entropy is defined. Any help is appreciated.

paul · October 29, 2018, 5:16pm

Thanks—maybe there is nothing better to do than what you and Michael suggested. I was just hoping there is a way to reduce the human effort. I guess in addition to checking the validation set, one could train and validate on different subsets and gradually clear up the mislabeled data on the whole dataset. I’d love it if there was any additional automation to reduce the effort in such tasks.

Tejaswani · October 29, 2018, 5:32pm

Hi,

I got this resolved. The issue is because of the extension .tar.gz. untar_data function is not treating tar.gz as a gzip file. But actually .tar.gz extension means the same as .tgz. I managed to get the dataset URL with .tgz extension and using this URL excluding the extension solved my issue.

@rameshsingh - Thank you so much