Lesson 2 In-Class Discussion

No, it’s just how pytorch (and therefore fastai) works. There’s no need to one-hot encode labels if you have just one label per image. (In general, you shouldn’t expect any overlap in details of the software libraries between v1 and v2 of the course - they are totally different.)

Strictly speaking, I should say we’re jumping out of saddle points - that is, areas which are quite flat and are minima in at least some dimensions. In these areas, training becomes very slow, so it can have a similar impact to being in a local minima.

1 Like

We’ll be learning about that later.

When I looked in the folder with ls I saw that all the filenames ended in ‘.jpg’. Also, that’s standard for nearly all photo image files.

1 Like

@yinterian perhaps you could provide some example code showing how to do the resize approach?

Best to think of Adam as a type of SGD. SGD with restarts can be applied to pretty much any SGD variant, including Adam. So yes, we’re adding SGDR on top of Adam.

4 Likes

link not working

Here is an example from fish.ipynb notebook (inside dl1 directory). In the case of the this Kaggle competition the important part of the image is a fish. The fish were often not in the middle of the image so the cropped image missed the most important information.

sz = 350
tfms = tfms_from_model(resnet34, sz, crop_type=CropType.NO)
data = ImageClassifierData.from_csv(PATH, "images", csv_fname, bs, tfms, val_idxs)
2 Likes

thanks @Moody for bringing this up.

@yinterian could you clarify if when we set sz that the default resize function is a center crop? And to avoid this we can just follow your provided code passing in crop_type=CropType.NO? Or is this only for augmentation/transforms?

I have a related question on sz parameter to tfms_from_model. Can we provide a Height x Width like (400 x 250) size or does it need to be a Square. Digging into the code little bit, it looks like it only expects one value for sz parameter and looks to be resizing to the square image?

Anyways we can keep size input as int or a tuple of h, w? or was that found to be not that useful?

1 Like

Based on what @moody mentioned, when cropping is used to convert the images to a square shape and data augmentation is used, does 1) cropping happen first and then augmentation or 2) data augmentation of the original image and then cropping? Seems like option 2 may retain more data, but option 1 is more computationally efficient

1 Like

It needs to be square. It’s a limitation of how GPUs are programmed nowadays, but I suspect at some point people will handle rectangular inputs too. For now, no library can do this (AFAIK).

Data augmentation happens before cropping, for the reason you mention (retains more data).

7 Likes

Ok…The reason I ask is, my images are stock photos that are aspect ration 1.5 to 1 on Height to width. I will try the square image. But in Torch Transforms, looks like it does take rectangular values - http://pytorch.org/docs/master/torchvision/transforms.html#torchvision.transforms.Resize

I can create a pull request if you think it might be a useful feature to add.

I’m also interested in having this feature as long as its technically feasible…in my case I’m working with images that are ratio 2:1

fwiw, I know some people in the recently completed Carvana Kaggle competition seemed to be using varying rectangular shaped images since that dataset contained images that were all ratio 1.5:1

For Data Augmentation how do we know the number of new objects being created?

Or are we just transforming the images?

No new object is getting created. But on each Epoch, the Dataset Loader will apply transforms with a random parameter (like zoom, sheer, shift etc) on each image to create slightly modified version of the input image so that network doesn’t over-train to the input images.

1 Like

It’s easy enough to do it in the transforms - the issue is that the model itself needs to have a consistent input size. I guess it doesn’t really have to be square - just consistent. But in practice generally we have a mix of landscape and portrait orientation, which means square is the best compromise.

If you have a dataset that’s consistently of a particular orientation, then perhaps it does indeed make sense to use a rectangular input - in which case feel free to provide a PR which allows that (i.e. sz everywhere it’s used would assume square if it’s an int, or rectangle if a tuple).

1 Like

But should we be creating new images? i.e. keep the original and create a new transformed version of it as well. I thought that’s data augmentation.

I guess if we have a sufficiently large dataset, doing an in-place transformation might be okay. But if we start will fewer data, it might be better to add. What’s the guidance here?

The new image is always created dynamically, on-the-fly, and then never reused. It’s not being stored anywhere, so the original is always kept.

3 Likes

training becomes very slow, so it can have a similar impact to being in a local minima.

In practice, can It actually result in local minima?