Non-Beginner Discussion

Hey man! Your messages, like once, are clarified as cutting butter with a warm knife. Thanks!

Your explanation is wonderful, but I already figured out this by looking at the source code, and by reading this:

I have kept reading those explanations so many often, but it ain’t feeling so clear to me. If I don’t unfreeze(), would it mean that the “body” stays untrained?

This is definitely an interesting observation and it is likely that indeed the datasets closely resembles ImageNet so you don’t really need to adjust the body of the model. As a side point, make sure your validation metrics are improving, not just your training metrics.

Yes, both the valid_loss and the metrics go better, without overfitting.

image

epoch train_loss valid_loss ssim psnr time
0 0.236554 0.208531 0.183433 11.753738 02:09
1 0.179139 0.148908 0.236383 13.805398 02:05
2 0.164469 0.179590 0.292767 12.279366 02:06
3 0.144973 0.139101 0.338365 14.361961 02:05
4 0.132566 0.115691 0.383597 15.746257 02:05
5 0.123076 0.140860 0.408780 14.752694 02:05
6 0.124454 0.101365 0.442580 16.926208 02:05
7 0.112748 0.102430 0.499645 16.352867 02:04
8 0.108559 0.099927 0.507152 17.267815 02:04
9 0.108745 0.096709 0.531147 17.573277 02:02
10 0.131972 0.105292 0.546981 16.911779 02:03
11 0.122663 0.106509 0.569845 17.131609 02:06
12 0.117344 0.116483 0.586394 16.417915 02:06
13 0.118717 0.096780 0.601940 17.860203 02:02
14 0.109777 0.108330 0.622874 16.764353 02:02
15 0.107149 0.086118 0.636838 18.321262 02:02
16 0.099475 0.098193 0.657461 17.738472 02:04
17 0.100719 0.132301 0.639843 15.730131 02:04
18 0.111611 0.107994 0.666687 16.818670 02:03
19 0.111105 0.120799 0.679617 16.494234 02:03
20 0.114704 0.207559 0.639382 12.127844 02:04
21 0.099970 0.098086 0.707660 17.849594 02:03
22 0.099835 0.089594 0.700292 18.560604 02:01
23 0.106707 0.138295 0.674177 15.729656 02:04
24 0.099064 0.073961 0.725095 20.033642 02:03
25 0.102635 0.084740 0.737844 18.210724 02:06
26 0.108957 0.155416 0.677125 14.684962 02:01
27 0.100712 0.096362 0.740240 18.032732 02:02
28 0.091303 0.098734 0.757668 17.141289 02:03
29 0.095276 0.082612 0.747881 18.612074 02:01
30 0.090979 0.079308 0.766427 19.219879 02:02
31 0.088959 0.062391 0.781548 21.137724 02:05
32 0.092504 0.070026 0.772093 19.750147 02:01
33 0.087208 0.074354 0.783176 19.588976 02:05
34 0.083044 0.066101 0.794248 19.470947 02:01
35 0.080781 0.075523 0.785750 18.851494 02:04
36 0.076060 0.088846 0.795860 16.493053 02:01
37 0.074469 0.068146 0.805137 19.896648 02:04
38 0.071076 0.070530 0.806764 20.062801 02:03
39 0.075082 0.070611 0.802137 20.234707 02:01
40 0.076525 0.073114 0.810220 18.325096 02:05
41 0.070763 0.070393 0.813122 18.819860 02:05
42 0.068504 0.073804 0.805940 18.706667 02:05
43 0.065610 0.068306 0.816073 19.237278 02:02
44 0.060239 0.064983 0.826524 19.970503 02:04
45 0.064982 0.084969 0.812253 18.099051 02:03
46 0.059241 0.054122 0.831039 21.738705 02:05
47 0.056639 0.056747 0.836223 22.513317 02:02
48 0.057209 0.046792 0.846990 24.197918 02:05
49 0.056328 0.046437 0.846901 23.958132 02:03
50 0.054749 0.047320 0.850570 24.019894 02:03
51 0.051009 0.047988 0.852203 23.820242 02:05
52 0.050929 0.047600 0.850500 24.071835 02:06
53 0.051956 0.043077 0.862234 24.669596 02:03
54 0.050984 0.043307 0.857978 24.458191 02:06
55 0.048370 0.046028 0.858897 23.934223 02:04
56 0.048205 0.044067 0.861460 24.493036 02:06

I reached up to ssim=0.90.

I used fit_one_cycle() but I tried to squeeze it even more by using a callback ReduceLROnPlateau(), but it seems to not work on the case of fit_one_cycle (the lr doesn’t get updated). So then I tried fit(), on which that callback does work, but the performance is slightly lesser than with fit_one_cycle(). What am I doing wrong, then?

1 Like

Yeah, it means that the “body” remains as the ImageNet-pretrained weights and is not adjusted any further through training.

This is expected ,ReduceLROnPlateau() is meant to be used just with fit and is kind of a (dynamic) LR schedule itself so it doesn’t work with other LR schedules. Sounds to me like fit_one_cycle is better for your use-case then.

What problem are you working on? Given that you are going SSIM and PSNR metrics, sounds like maybe an image super-resolution or restoration project?

1 Like

Thanks for the snippet here. These are sometimes like grain salt on a medium-well steak. :wink:

I’ll probably be glad to start using timm models as well as torchvision's

Yes, restoration project. And soon to be another in super-resolution as well.

I’ve made some HWs about fit_one_cycle function (as known as the Discriminative Learning Rates method), and also read this:

And decided to give up on fit and ReduceLROnPlateau(). I found out that maybe playing with the pct_start could very improve my learning rate results. Now I’m reading the tutorial about learner to find more inspirational tools.

epoch train_loss valid_loss ssim psnr time
0 0.235169 0.194132 0.238271 12.296492 02:27
1 0.177023 0.131278 0.410087 14.300444 02:01
2 0.157763 0.180564 0.482123 12.638127 02:02
3 0.156667 0.107851 0.600349 16.760780 02:02
4 0.141088 0.229797 0.584980 10.980514 02:04
5 0.126984 0.136192 0.669805 15.924749 02:02
6 0.111853 0.083819 0.710055 18.934025 02:04
1 Like

You could also potentially try Ranger optimizer + fit_flat_cos

I’m trying to use RandomErasing() and apply it only on the input images, while it applies it on both input and target.

Like this guy tried to do this here:

I found out that this augmentation is being called twice:

  • once for all of the input images in a batch;
  • and another time for all of the target images of the batch.

I need to make it be called only once, for the input.

Here’s the source code:

# Cell
def cutout_gaussian(x, areas):
    "Replace all `areas` in `x` with N(0,1) noise"
    chan,img_h,img_w = x.shape[-3:]
    for rl,rh,cl,ch in areas: x[...,rl:rh, cl:ch].normal_()
    return x

# Cell
def _slice(area, sz):
    bound = int(round(math.sqrt(area)))
    loc = random.randint(0, max(sz-bound, 0))
    return loc,loc+bound

# Cell
class RandomErasing(RandTransform):
    "Randomly selects a rectangle region in an image and randomizes its pixels."
    order = 100 # After Normalize
    def __init__(self, p=0.5, sl=0., sh=0.3, min_aspect=0.3, max_count=1):
        store_attr()
        super().__init__(p=p)
        self.log_ratio = (math.log(min_aspect), math.log(1/min_aspect))

    def _bounds(self, area, img_h, img_w):
        r_area = random.uniform(self.sl,self.sh) * area
        aspect = math.exp(random.uniform(*self.log_ratio))
        return _slice(r_area*aspect, img_h) + _slice(r_area/aspect, img_w)

    def encodes(self,x:TensorRawImage):
        count = random.randint(1, self.max_count)
        _,img_h,img_w = x.shape[-3:]
        area = img_h*img_w/count
        areas = [self._bounds(area, img_h, img_w) for _ in range(count)]
        return cutout_gaussian(x, areas)

Maybe the solution is to change somewhere the source code to let it apply only on the input. Any idea?

I think the problem is that the transform is being type-dispatched, so since both x and y are images, it is being applied to both. One alternative is to have a separate type for y images so it doesn’t get applied there, but that may be too complicated. I think this solution is probably easier:

Why it works is that it redefines __call__ which usually checks for the type of the data and applies based on the type dispatch, but here will be applied however you define it.

2 Likes

Thanks!
If I use his solution (which I also thought about doing so), but I still want the RandomErasing mechanism, then shall I do something like this?

# Cell
def cutout_gaussian(x, areas):
    "Replace all `areas` in `x` with N(0,1) noise"
    chan,img_h,img_w = x.shape[-3:]
    for rl,rh,cl,ch in areas: x[...,rl:rh, cl:ch].normal_()
    return x

# Cell
def _slice(area, sz):
    bound = int(round(math.sqrt(area)))
    loc = random.randint(0, max(sz-bound, 0))
    return loc,loc+bound

# Cell
class RandomErasing(RandTransform):
    "Randomly selects a rectangle region in an image and randomizes its pixels."
    order = 100 # After Normalize
    def __init__(self, p=0.5, sl=0., sh=0.3, min_aspect=0.3, max_count=1):
        store_attr()
        super().__init__(p=p)
        self.log_ratio = (math.log(min_aspect), math.log(1/min_aspect))

    def _bounds(self, area, img_h, img_w):
        r_area = random.uniform(self.sl,self.sh) * area
        aspect = math.exp(random.uniform(*self.log_ratio))
        return _slice(r_area*aspect, img_h) + _slice(r_area/aspect, img_w)

    def __call__(self, b, **kwargs):
        x,y = b
        count = random.randint(1, self.max_count)
        _,img_h,img_w = x.shape[-3:]
        area = img_h*img_w/count
        areas = [self._bounds(area, img_h, img_w) for _ in range(count)]
        return cutout_gaussian(x, areas), y

Or am I still missing out something critical here because of lack of understanding __call__ or encodes(). In other words, could I give up on either one of these?

1 Like

I think that should work… Try it out and let us know…

Sadly it didn’t work… It seemed like the __call__ wasn’t even called for.

I tried to get smarter with this tutorial:

Where it kinda explains crucial things like how to apply a transform only on the train dataset, but not the valid dataset.

But it doesn’t really explain applying a transform only on the input. :frowning:

I also found this guy’s advice:

Followed what you said:

I think the problem is that the transform is being type-dispatched, so since both x and y are images, it is being applied to both.

Seems that you were right, and the other guy found a way to solve that by improvising on that basis.

I also found this topic:

I think that I got inspired by them for a good solution.
My idea is to add some flag on my TensorRawImage that holds either input or target. The transform later would check out whether it’s input or target, just like it would have checked on this whether it’s a train or valid (with the flat split_idx).

I will post here a nice solution if it works.

Edit: Wow, almost made it. It was very challenging :cold_sweat:
But I got some problem but well, can’t solve this much

Huh that’s interesting, I would have thought overriding __call__ would have worked.

But yeah the other solution does what I mentioned before: having a separate type for y images so it doesn’t get applied there.

That should work, I think I have done it myself in the past too. Let me know how your flag idea goes…

1 Like

The thing with the flag is that I can’t pass it in from the equivalent Image.Image object (Mine is RawObj) to TensorImage (Mine is TensorRawImage).

If I could do so, then inside the RandomErasing() I’d have only added an if clause that checks:

if x.flag=="input":
    < rest of the same code >
    return cutout_gaussian(x)
else:
    return x

I can make PILImage inherits something from Image.Image (as equivalently as with my new objects), but I can’t let TensorImage inherits from PILImage (and accordingly with my objects).

It’s way beyond my knowlodge of PyTorch/Fastai source code, but if there was a way to do so, then I’d easily be then able to pass in that flag from the Image.Image to TesnorImage for each input and target accordingly.

Maybe @Jeremy could enlighten us here… :smiley:

They’re not the same thing. Discrim LR is different LRs for different layers. fit_one_cycle is the 1-cycle scheduler.

Please don’t at-mention me for non-admin things.

That’s what I’d do.

1 Like

That’s what I’d do.

Yeah, it works, although not being the most elegant solution, but hey, it works.

My inner programmer likes to find scalable solutions for further uses (other Transforms, other TensorTypes, less unknown declarations, etc)

Thanks though!

Please don’t at-mention me for non-admin things.

Thanks for letting me know. Only tagged you because my solution could be scalable, but anyway I solved my case.

1 Like

Hi Florian,

Thank you very much for sharing your experience. I agree that using CLIP would probably be a more practical approach to the problem but I was looking for something that could be train in an unsupervised manner on any dataset (not only ImageNet-like). I confirm that self_supervised is a great library – simple and easy to work with. :slight_smile:

I agree with everything you wrote but I am wondering if there may be a better/simpler way. I found two interesting papers that fueled my interest in this:

  1. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric – they show that all the (un-pooled) multi-scale features are very good predictors of perceptual similarity
  2. On the surprising tradeoff between ImageNet accuracy and perceptual similarity – the worse model you train the better it’s perceptual match to humans

I, just like you, tried several pooling methods and did not find any to be much better than the rest. I then tested the localized ResNet18 features from the 14x14x256 layer and they were surprisingly accurate at finding similar parts of other images. That’s why I suspect that the pooling methods do not make any real sense and in fact actively reduce the capability of the network to figure out the image content (unless we “train around” them by fine-tuning the whole network).

I remember Jeremy saying in one of the previous courses that you should strive to make the task “easy to learn” for the network – if calculating element-wise avg or max of the local features does not have sensible semantic meaning then I suspect we are making the task harder, not easier.

PS. I think NetVLAD is based on a similar observation but I have yet to dive deeper into this approach.

I’m wanting to learn more about defaults in the Fast AI library. Ideally how they are set and why they are set.

One topic of interest to me is the optimizer. Reading chapter 16 of the text ( https://colab.research.google.com/github/fastai/fastbook/blob/master/16_accel_sgd.ipynb#scrollTo=W8YIFHdHRGVf) it states: “In fastai, Adam is the default optimizer we use since it allows faster training”

Specific questions:

  • Where in the fast AI library can I see this is the default optimizer?
  • How do I see what other defaults are chosen for us in the fast AI library?

I can see fastai code for the optimizers here: fastai/12_optimizer.ipynb at master · fastai/fastai · GitHub but am not seeing or understand how defaults are setup.

I’m not sure if this will be covered in the 2022 version of the course, so this might not even be the right forum to post this type of question.

So, what I did to try to figure this out was a quick grep for the term adam & Adam as a starting point, that led me to the top level learner.py, and there I found the default value for opt_func to be set to Adam
https://github.com/fastai/fastai/blob/master/fastai/learner.py#L85
If you look at the top of the file, you’ll notice it’s generated from nbs/13a_learner.ipynb, which then you can go browsing further.

Since the learner is an entrypoint for a lot of “configuration” settings, it’s not a bad place to start looking into all the arguments that are pre-defined, those are the default settings for the learner object. And, similarly there’s default settings (usually in form of default arguments value) for other functions/objects as well.

Sometimes the best thing to do is just start searching/grepping in the codebase. (given this is a non-beginner discussion). I can recommend tools such as grep, ag or rg to help with this. Or, anything that’s just built into the editor “Search in project” feature as well.

I hope this was somewhat helpful.

5 Likes

@suvash - Thanks! I was just using the github search feature and wasn’t able to find that on my own.

I personally think it would be good to identify at a high level all default decisions made by a library. Part of the cool benefit of the fastai library is the work of many people over many years identifying good and sensible defaults that work well for a wide range of problems. Knowing those defaults and the reasons behind them can provide knowledge that may transfer to other deep learning problems or libraries.

1 Like