Lesson 6 - Official topic

It generally doesn’t happen. When using a pretrained model the last layers need the largest learning rate since they were randomly initialized while the middle layers are pretrained and are usually better than random for the dataset you are working on.

1 Like

Yes. But not all GPU support mixed precision.

1 Like

Why isn’t fp16() the default? Or is it?

2 Likes

You’d need to see if it’s supported by your GPU. NVIDIA would be the best place to look

https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html

1 Like

For multiple labels in one image – don’t we need a bounding box over the objects to identify where the labelled objects are in the image ? if not (like in Pascal?) what does the machine actually learn – since it doesnt know where things are?

1 Like

Not all GPUs support it. Also, someone correct me if I’m wrong but I believe truncating it to 16 FP also affects accuracy slightly?

It does not learn that, just that the object is in the picture.

No it is not. Because some older GPU didn’t support mixed precision.

1 Like

Yes. It affects. But it helps to generalise well. I guess.

If you mean that it affects the floating-point precision, then it’s true sometimes, but oftentimes you can check the histograms of activations & parameter values and see if you’re in danger of losing precision. Usually it is fine, but there are notable cases (e.g., training GANs) where it causes major problems.

3 Likes

TLDR; that situation is not expected to occur in practice.

The assumption is that the learning rate should depend inversely on the layer depth, so that it smoothly increases from the deepest layers to the shallowest. In the pre-trained model, the deepest layers correspond to primitive operations like edge detectors, and contrast detectors, and are expected to apply to your new image data set with little change, so you can train them “gently” (i.e. with a relatively small learning rate) The shallower layers are more specialized to the ImageNet data set (or whatever data set was used to obtain the pre-trained weights), and so you want to train them “harder” (i.e. with higher learning rate) to find the optimal values for the image dataset you are applying the model to.

2 Likes

Not sure whether GTX series supports fp16, but the RTX series does.

Much better worded. Yes, that’s what I intended. Hard to focus on the lesson and type for me. I don’t know how to do the checking you mentioned though. Can you enlighten me?

If one should train a network and get good results on validation, how confident should one be that during inference the results would be equally good? In short, does good validation guarantee imply good inference?

I had written An intro to Mixed Precision Training + run some benchmarks (Might be outdated) during the last year’s course.

5 Likes

I’ve never heard that before but it’s brilliantly intuitive: We use zip(*b) as a way of transposing some iterable b.

2 Likes

Interesting, my 1080ti supports fp16(), and it seems to lower GPU ram usage, but does not train faster because of the lack of tensor cores. I wonder if it’d be simple to detect fp16() compatibility and turn it on by default, kind of like how fast.ai automatically detects if a GPU is available to train on.

Does lr_find (steepest point) find the unequivocally best learning rate or might there be reasons why other rates could be more useful for a situation?

1 Like

Or there is the doc page about it :-p

4 Likes

It certainly gives you something that is in the aggressive range, since we want training to go fast. In some instance where you need to train very slowly, you might need something lower.

2 Likes