Lesson 2 discussion - beginner

arjunrajkumar · November 9, 2017, 6:54am

In the dogs vs cats lesson, we created a sample folder.
But in the dog breed lesson, there was no sample folder.

Curious as to why the sample directory was not created in the dog breed competition?
Is this because we used smaller image sizes to get faster results, and then gradually increased size.
Is this method of using a smaller image size initially a better/faster alternative than creating a sample folder?

thiago · November 9, 2017, 10:56am

Hi @jeremy! How can I disable the cross-validation? I’d like to use the whole dataset to train, but when I set ‘val_idxs’ to None (the default value) in the function ImageClassifierData.from_csv I get the error:

“Arrays used as indices must be of integer (or boolean) type”

Edit:
And if I use any value smaller than 0.2 in the get_cv_idxs I’m getting the following error in the ConvLearner.pretrained function:

rikiya · November 9, 2017, 5:49pm

@thiago I have exactly the same question too, thanks for asking

jeremy · November 9, 2017, 8:48pm

The sample folder was part of the original download - it wasn’t created automatically. It was created for last year’s course - we don’t use it any more.

jeremy · November 9, 2017, 8:50pm

Ah the problem here is that your dataset size has changed, so your precomputed activations are now the wrong size - so delete your data/dogscats/tmp folder.

jeremy · November 9, 2017, 8:51pm

I’m not quite sure what your confusion is - can you tell me in more detail your understanding of what’s happening, and what you aren’t sure about?

vikbehal · November 9, 2017, 11:25pm

When the model is being trained (and frozen), do we go through all layers or just last few layers?

radek · November 9, 2017, 11:29pm

We still use all the layers, there are no discontinuities, but the frozen layers stay as they were upon freezing - the gradient flows through them but they do not get updated. The data can flow freely but we don’t train them - what they do doesn’t change as a result of training.

So training only makes sense in the context of having something not be frozen - we can freeze all the earlier layer and just have the last layer not be frozen - the data will flow through the neural net freely, but only the last layer will learn.

Does this answer your question?

jeremy · November 10, 2017, 1:35am

Don’t worry about this too much just yet - we’ll deal with all the theory and details in future lessons. For now, focus on using the notebooks to run your own experiments.

vikbehal · November 10, 2017, 4:06am

Thanks. What’s the contribution of going through those layers? Updated weights/values which are generated but used only for last few lawyers?

radek · November 10, 2017, 9:29am

NNs in their simplest form are just functions inside a function. Each layer takes what the previous layer gives it, does some computation on it, and handles it to the layer above.

Most layers that do interesting things not only take the data from the layer below, but also contain some trainable parameters specific to that layer. Still, we cannot just remove a layer if we don’t want to train it - the layers up the layer chain depend on them doing their work, performing their calculations. So by freezing, we still keep the earlier layers in place and have them do their computations, but we don’t train them - we do not alter the trainable parameters as a result of seeing data.

So to sum up, all layers perform their calculations, but it is only non-frozen layers that update their parameters based on the data our network sees.

pnvijay · November 10, 2017, 11:48am

Hi Sree

Select Forgot Password? in Kaggle Website, you’ll receive an email with a few different options. One of the options lets you set up your own Kaggle username/password and connects it to your google account. You can also go through this forum post on all things related to kaggle-cli http://wiki.fast.ai/index.php/Kaggle_CLI

thiago · November 10, 2017, 11:50am

Thanks @jeremy! Delete the tmp folder allowed me to use smaller validation sets.

But, when I set val_idxs to None (the default value), I’m still getting the error: “Arrays used as indices must be of integer (or boolean) type”

Am I missing something?

naveenmanwani · November 10, 2017, 3:35pm

hi
i was wondering ,why this was used ,could anyone please explain me the intuition behind this step
[Crestle has the datasets required for fast.ai in /datasets, so we’ll create symlinks to the data we want for this competition. (NB: we can’t write to /datasets, but we need a place to store temporary files, so we create our own writable directory to put the symlinks in, and we also take advantage of Crestle’s /cache/ faster temporary storage space.)]

jeremy · November 10, 2017, 4:41pm

No I’ve not tested that; it’s a bug! For now just create a list with a single index, e.g. [0]. I’ll try to fix the bug soonish.

mprabhu · November 11, 2017, 12:51pm

How to get ‘class_labels’ as output against ‘class_probability’.
for eg: i need to generate submission file as

filename, class
001, dog
002, cat
003, frog

jeremy · November 11, 2017, 3:32pm

data.classes has the class names.

mprabhu · November 11, 2017, 4:44pm

thank you @jeremy

mprabhu · November 12, 2017, 1:13am

while creating aws instance is there any difference between
1- creating the key pair in aws interface itself and importing to our local
2- creating a key pair in local system and exporting it into aws

If we choose 1st option will aws charge for it ?

jeremy · November 12, 2017, 5:10am

Either approach is fine. I like (2) since you can reuse your key elsewhere.