Lesson 6 - Official topic

They can be anywhere you like, but your get_x function will have to make the path for them.

This is sometimes referred to as “lazy loading.” It’s used in a lot of frameworks to make sure that data arrives just-in-time, and of-course to preserve memory.

@sgugger – nevertheless, @giacomov’s suggestion makes sense to me – i.e. for each learning rate, average the loss over several mini-batches.

No expert here, but you’re probably going to have to modify the dataloader classes you use to load the X data to read your additional columns (in addition to the flattened RGB image. Example: a 3x3 pixel image has 9 pixels values between 0-255 for each of 3 (RGB) channels for a total of 27 x values per row, now if you added time (lets say), you’d be adding a 28th value to the row.

Hey guys, I could use some help here: I’ve been reading the Cyclical Learning rates paper and in describing the triangular learning rate policy this code is given:

Now the paper says that the variable epochCounter is the number of epochs of training which means that the cycle variable is only going to increase to say 2 when the number of epochs is greater than or equal to 2*stepsize. So for a stepsize of 2000, we would have to train for 4000 epochs or more before the cycle increases from 1 to 2.

However, I think that the epochCounter should rather refer the number of iterations which I think would make more sense.
@sgugger @muellerzr

Yeah, the equation is wrong. There’s a post about it

https://forums.fast.ai/t/draft-of-fastai-book/64323/39

But I didn’t see a reply to that person’s post.

1 Like

Anyone having issues running the notebook for collaborative filtering on the merge part?
ratings = ratings.merge(movies)?

I changed the name, but even with “movies” it won’t work. any suggestions?

---------------------------------------------------------------------------
MergeError                                Traceback (most recent call last)
<ipython-input-34-eb862f217676> in <module>
      1 #issue in merge
----> 2 ratings = ratings.merge(scripts)
      3 ratings.head()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pandas/core/frame.py in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
   7295             copy=copy,
   7296             indicator=indicator,
-> 7297             validate=validate,
   7298         )
   7299 

/opt/conda/envs/fastai/lib/python3.7/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     84         copy=copy,
     85         indicator=indicator,
---> 86         validate=validate,
     87     )
     88     return op.get_result()

/opt/conda/envs/fastai/lib/python3.7/site-packages/pandas/core/reshape/merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    618             warnings.warn(msg, UserWarning)
    619 
--> 620         self._validate_specification()
    621 
    622         # note this function has side effects

/opt/conda/envs/fastai/lib/python3.7/site-packages/pandas/core/reshape/merge.py in _validate_specification(self)
   1196                             ron=self.right_on,
   1197                             lidx=self.left_index,
-> 1198                             ridx=self.right_index,
   1199                         )
   1200                     )

MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False

Why are the points, from the chapter 6 regression example, normalized to a value between -1 and 1?

I understand that is why we use the y_range we do … but I’m not sure why that range to being with.

learn = cnn_learner(dls, resnet18, y_range=(-1,1))

When we do points it’s converted to a %, -100% (far left) and +100% (far right), with 0,0 at the very center of the image. Does this help @wgpubs? :slight_smile: it makes augmentation much easier as everything is now relative vs other libraries that struggle with point augmentation

1 Like

Yup makes sense. Sounds like the primary reason is for the augmentation bits mostly (e.g., to move the points to the right spot, or adjust a bounding box, etc…).

Exactly :slight_smile:

This is really weird.
Can you please print movies.head() and ratings.head()?

1 Like

Something Jeremy said is that if you see overfitting, instead of taking the best model, re-train with the n_epochs equal to the epoch the learner starts to overfit (retrain the cnn_learner with 8 epochs instead of 12 epochs)

Is there a reason to this? Jeremy said something like you want the learner to have a low learning rate at the final steps, but I don’t see how that impacts the performance of the final model. Has anyone done any experiments with using the SaveModelCallback vs re-training at the “ideal” number of epochs?

Practically speaking, if this is the case, that would mean (assuming no time/resource constraints), it would always be better to let the learner train for a large number of epochs, then do one final training at reduced number of epochs to get the best model possible?

1 Like

Suppose I want to detect if the person wearing an eyeglass. So how would I approach to this problem. Is this binary classification? And how would structure dataset, e.g. should I get images of people wearing eyeglasses and people without eyeglasses? Thank you!

1 Like

Yes. See the dog/cat classification models from the book on how to set this up (that would be a good approach in terms of structuring things).

Yes.

And make sure you split your training set so that your validation set contains people not seen in the training set, and also that you have a good representation of folks with and without sunglasses.

1 Like

Firstly, note that the graph is smoothed by means of a moving average, so it is “delayed” w.r.t. the true values. Plus, just after the min the 1st derivative changes its sign (by definition) so you don’t to pick your LR to be the minimum.

They cannot cover all the details in such a course, but the loss surfaces generated by NNs are very peculiar. Al lot of local minima are good enough.
See:



Also, see the paper by Choromanska and LeCun about the spin-glasses.

Afaik, every implementation around is inspired by Fastai.

You won’t find the “best” one anyhow. And maybe you don’t want to find it… You want to find a minimum which is good enough and generalizes well.

Would you link me the exact position in the video? Thanks.