Lesson 6 In-Class Discussion ✅

Mauro · December 1, 2018, 1:08pm

Hello,

I’m very confused with something in the Rossman example. Particularly, in this line of code:

learn = tabular_learner(data, layers=[1000,500], ps=[0.001,0.01], emb_drop=0.04,
y_range=y_range, metrics=exp_rmspe)

Here “layers” is an intermediate layer of 500,000 parameters. Jeremy said that there are too many parameters that would likely overfit our data. But we have over 1,000,000 samples. So how do you know from before hand how many parameters are likely to overfit your data?

Mauro · December 1, 2018, 1:52pm

So is batch size the same as number of features?

gianferrarif · December 1, 2018, 11:16pm

I have a follow-up up on this: when I set-up the data bunch I set an image size: will this image size affect the the size of the last layer before the pooling? So if the image size is double height and width I will have bsx512x22x22 instead of bsx512x11x11?

sgugger · December 2, 2018, 1:25am

Indeed. But the pooling layer will convert those 22x22 or 11x11 to just one.

gianferrarif · December 2, 2018, 2:48pm

Thanks!. But I have another thought on this.
In lesson 6 Jeremy first trains the model with low resolution images and then makes a second round with an higher one.
He also mentions this approach of incrementally raising the resolution in lesson 1 or 2 to achieve a better training.

If I understood convolutions correctly, the more we proceed on the layers, more and more complex structures are “recognized” and generate activations: we start with simple edges and then these features become parts of more complex patterns, like circles, repeating rectangles and so on.

At the end of the network, each channel of the final tensor (with a coarse resolution) contains the activations of these particular sets of complex features. We avg pool or max pool to size 1 in height and width it to have the final global level of activations independently of their position

On the last fully connected layer, the classification is done using a fully connected layer in order to estimate the best combination of complex features that uniquely define the target class.

But in my understanding, these structure recognition is not invariant in respect on the pixel size of the structures themselves, due to the fixed size of the convolution kernels.

Therefore (i know that the following statement may be an over-semplification) the set of complex features that define a car that occupies 80% of a 640x480 image may be some black round things of size 20 pixels (the tires) with some trapezoid glassy looking things of around 30 pixels (the car windows) and some round smaller round things (the front lights).

But a car that occupies 80% of a 2048*1300 or whatever 4k image, is defined by probably a different set of features, maybe the structure of the handles of the doors, the form factor of a Ford logo. And maybe the features that had been trained to be relevant for a car in the 640x480 image size pass will be recognized by a smaller car in a 4k image.

Therefore training on different resolutions on different iterations is not important in order to train the network on better understand the visual concept of a generic car, but to be able to recognize cars of different sizes in relation of the kernel height and width. Therefore is not so different on the zooming transformation for data augmentation.

Thanks for reading this long post and for sharing your thoughts and letting me know if I am right or wrong.
F

cedric · December 2, 2018, 5:22pm

Here’s the arXiv version of the paper that was accepted for ACM Conference on Fairness, Accountability, and Transparency (ACM FAT* '19).

Abstract

Quantitative definitions of what is unfair and what is fair have been introduced in multiple disciplines for well over 50 years, including in education, hiring, and machine learning. We trace how the notion of fairness has been defined within the testing communities of education and hiring over the past half century, exploring the cultural and social context in which different fairness definitions have emerged. In some cases, earlier definitions of fairness are similar or identical to definitions of fairness in current machine learning research, and foreshadow current formal work. In other cases, insights into what fairness means and how to measure it have largely gone overlooked. We compare past and current notions of fairness along several dimensions, including the fairness criteria, the focus of the criteria (e.g., a test, a model, or its use), the relationship of fairness to individuals, groups, and subgroups, and the mathematical method for measuring fairness (e.g., classification, regression). This work points the way towards future research and measurement of (un)fairness that builds from our modern understanding of fairness while incorporating insights from the past.

More about this important topic in Margaret Mitchell’s very interesting TED talk:

jeremy · December 2, 2018, 7:34pm

It’s not invariant in any purely mathematical way, but it learns to be invariant because their are objects of different sizes in the images. But it’s true that varying the image size can help generalization here more too.

brismith · December 2, 2018, 8:04pm

I thought this worked for me in 31 - but looking at the change list there was a change in 31 and I see an error in 32 when I load the model. My previous tests must have been in 30. Is there a change needed in how we save/create to include the optimizer? My models saved in 32 will not load in 32.
From change list - `Learner.load` and `Learner.save` will also load/save the optimizer state
My error:
learn = create_cnn(data, models.resnet34, metrics=error_rate, bn_final=True).load(‘352’) :

ValueError Traceback (most recent call last)
in
----> 1 learn = create_cnn(data, models.resnet34, metrics=error_rate, bn_final=True).load(‘352’)

/anaconda/envs/fastai-3.6/lib/python3.6/site-packages/fastai/basic_train.py in load(self, name, device, strict, with_opt)
213 if ifnone(with_opt,True):
214 if not hasattr(self, ‘opt’): opt = self.create_opt(defaults.lr, self.wd)
–> 215 self.opt.load_state_dict(state[‘opt’])
216 else:
217 if with_opt: warn(“Saved filed doesn’t contain an optimizer state.”)

/anaconda/envs/fastai-3.6/lib/python3.6/site-packages/torch/optim/optimizer.py in load_state_dict(self, state_dict)
106 saved_lens = (len(g[‘params’]) for g in saved_groups)
107 if any(p_len != s_len for p_len, s_len in zip(param_lens, saved_lens)):
–> 108 raise ValueError("loaded state dict contains a parameter group "
109 “that doesn’t match the size of optimizer’s group”)
110

ValueError: loaded state dict contains a parameter group that doesn’t match the size of optimizer’s group
I couldn’t find any other threads with this issue

brismith · December 2, 2018, 8:21pm

Specifying with_opt=False resolved my issue - I assume I could have used with_opt=True for saving the model, but it looked like that was default from my reading of the code.

learn = create_cnn(data, models.resnet34, metrics=error_rate, bn_final=True).load(‘352’,with_opt=False)

sgugger · December 2, 2018, 8:38pm

I’m guessing you need to unfreeze your model before loading, so that the optimizer has the right size.

brismith · December 2, 2018, 8:44pm

Thanks @sgugger - reviewing my previous notebooks I can’t see that I needed to do this before - nor in the sample pets more. Is this because the optimizer wasn’t being saved/loaded before your recent change?

jeremy · December 3, 2018, 1:20am

That’s exactly right. It’s nice to have the optimizer state saved, but @sgugger will add something to ignore the saved optimizer state if the frozen layers have changed.

takotab · December 3, 2018, 10:46am

Very nice to see that you have done actual experiments to test your hypothese around the dropout in embedding. Do you have- or are you going to- cover(ed) any tips to design these experiments, especially with large dataset and reducing the time it takes to get get feedback on an idea.

Any tips would be appreciated. Thanks so much, learning a lot during the course.

sgugger · December 3, 2018, 4:31pm

Just pushed it actually. The error shouldn’t be there anymore: pytorch tries to load the optimizer state and if it can’t, it doesn’t return an error.

Mauro · December 3, 2018, 8:33pm

A stable version of Pytorch 1.0 will be released this Friday.

Devonyates · December 4, 2018, 11:33pm

This article has a really cool investigation into the impact of texture on vision neural networks. It also proposes a clever approach to preventing texture from dominating the recognition.

maryam · December 5, 2018, 7:04am

Does anyone know how to solve this error?

ModuleNotFoundError: No module named 'dataclasses'

I got this error after updating fastai package and running from fastai import *

keyurparalkar · December 5, 2018, 2:45pm

Yeah I am getting same error while installing fastai library on google colab. dataclasses module is supported by python3.7. But currently I am figuring out how to install python3.7 on google colab. Can anyone help me on this issue?

drrosa · December 5, 2018, 5:10pm

Same issue on Colab. It looks like the python version is not the problem though. I fixed it by downgrading to the previous fastai version:

!pip install fastai==1.0.32

@maryam

jerbly · December 5, 2018, 7:15pm

At about 26:00 in the video there is a discussion about RMSE and RMSPE. Maybe my mathematics is a little too rusty but I don’t see how taking the log of y and y_hat turns the ratio into a difference. Can someone explain?