Lesson 2 In-Class Discussion

sermakarevich · November 8, 2017, 9:00am

Maybe just looked into a folder?

Moody · November 8, 2017, 9:04am

Can you specify the source code/file for me, please? Am I looking at the right place (see link below)?

https://sourcegraph.com/github.com/fastai/fastai@master/-/tree/fastai/models

ar_ai · November 8, 2017, 9:50am

Adam or optimizers with adaptive learning takes in a predefined learning rate and adapts itself to find the minima by adjusting the learning rate in a specific way. Responsibility is still on our shoulders to find a reasonable learning rate to feed into the algorithm or it will either diverge and never reach minima(in case of very fast learning rate) or reach there very slowly expending a lot of computational power (slow learning rate). An approach that is commonly followed in these cases is to manually change different learning rates(most often starting from a faster learning rate and progress down) depending on feedback from the loss curve. This approach described by Jeremy is much simpler in execution wise and saves a lot of computation/human time while keeping it very simple. So yes, we are following a novel and simple approach to find an optimal learning rate to feed into our optimizer (It can be any optimizer of your choice and not limited to Adam).
@jeremy One doubt that I have is - During the training time, are we still using Adam as an optimizer or are we using Stochastic Gradient Descent with restarts or is it Adam with restarts and does the library have the ability to apply restarts to any optimizing algorithm?

First time when it is run, network takes in the pre - trained weights from models/weights folder (if you are not using fastai ami, you need to download those weights from http://files.fast.ai/models/weights.tgz) and keep it in that folder and compute activations for your dataset using those weights. I guess, time delay is because of this.

sermakarevich · November 8, 2017, 12:08pm

def lr_find(self, start_lr=1e-5, end_lr=10, wds=None):
    self.save('tmp')
    ...
    self.load('tmp')

jeremy · November 8, 2017, 1:01pm

We’re computing that activations of the penultimate layer for our dogs v cats dataset - that’s what takes time. (We don’t download activations from the internet - they are calculated; we download weights from the internet).

pete.condon · November 8, 2017, 1:02pm

Thanks!

jeremy · November 8, 2017, 1:03pm

No, it’s just how pytorch (and therefore fastai) works. There’s no need to one-hot encode labels if you have just one label per image. (In general, you shouldn’t expect any overlap in details of the software libraries between v1 and v2 of the course - they are totally different.)

jeremy · November 8, 2017, 1:04pm

Strictly speaking, I should say we’re jumping out of saddle points - that is, areas which are quite flat and are minima in at least some dimensions. In these areas, training becomes very slow, so it can have a similar impact to being in a local minima.

jeremy · November 8, 2017, 1:05pm

We’ll be learning about that later.

jeremy · November 8, 2017, 1:05pm

When I looked in the folder with ls I saw that all the filenames ended in ‘.jpg’. Also, that’s standard for nearly all photo image files.

jeremy · November 8, 2017, 1:06pm

@yinterian perhaps you could provide some example code showing how to do the resize approach?

jeremy · November 8, 2017, 1:08pm

Best to think of Adam as a type of SGD. SGD with restarts can be applied to pretty much any SGD variant, including Adam. So yes, we’re adding SGDR on top of Adam.

naveenmanwani · November 8, 2017, 2:56pm

link not working

yinterian · November 8, 2017, 7:32pm

Here is an example from fish.ipynb notebook (inside dl1 directory). In the case of the this Kaggle competition the important part of the image is a fish. The fish were often not in the middle of the image so the cropped image missed the most important information.

sz = 350
tfms = tfms_from_model(resnet34, sz, crop_type=CropType.NO)
data = ImageClassifierData.from_csv(PATH, "images", csv_fname, bs, tfms, val_idxs)

jamesrequa · November 8, 2017, 7:50pm

thanks @Moody for bringing this up.

@yinterian could you clarify if when we set sz that the default resize function is a center crop? And to avoid this we can just follow your provided code passing in crop_type=CropType.NO? Or is this only for augmentation/transforms?

ramesh · November 8, 2017, 8:02pm

I have a related question on sz parameter to tfms_from_model. Can we provide a Height x Width like (400 x 250) size or does it need to be a Square. Digging into the code little bit, it looks like it only expects one value for sz parameter and looks to be resizing to the square image?

Anyways we can keep size input as int or a tuple of h, w? or was that found to be not that useful?

resmi · November 8, 2017, 8:06pm

Based on what @moody mentioned, when cropping is used to convert the images to a square shape and data augmentation is used, does 1) cropping happen first and then augmentation or 2) data augmentation of the original image and then cropping? Seems like option 2 may retain more data, but option 1 is more computationally efficient

jeremy · November 8, 2017, 8:09pm

It needs to be square. It’s a limitation of how GPUs are programmed nowadays, but I suspect at some point people will handle rectangular inputs too. For now, no library can do this (AFAIK).

Data augmentation happens before cropping, for the reason you mention (retains more data).

ramesh · November 8, 2017, 8:13pm

Ok…The reason I ask is, my images are stock photos that are aspect ration 1.5 to 1 on Height to width. I will try the square image. But in Torch Transforms, looks like it does take rectangular values - http://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms.Resize

I can create a pull request if you think it might be a useful feature to add.

jamesrequa · November 8, 2017, 8:45pm

I’m also interested in having this feature as long as its technically feasible…in my case I’m working with images that are ratio 2:1

fwiw, I know some people in the recently completed Carvana Kaggle competition seemed to be using varying rectangular shaped images since that dataset contained images that were all ratio 1.5:1