Non-Beginner Discussion

Yes, restoration project. And soon to be another in super-resolution as well.

I’ve made some HWs about fit_one_cycle function (as known as the Discriminative Learning Rates method), and also read this:

And decided to give up on fit and ReduceLROnPlateau(). I found out that maybe playing with the pct_start could very improve my learning rate results. Now I’m reading the tutorial about learner to find more inspirational tools.

epoch train_loss valid_loss ssim psnr time
0 0.235169 0.194132 0.238271 12.296492 02:27
1 0.177023 0.131278 0.410087 14.300444 02:01
2 0.157763 0.180564 0.482123 12.638127 02:02
3 0.156667 0.107851 0.600349 16.760780 02:02
4 0.141088 0.229797 0.584980 10.980514 02:04
5 0.126984 0.136192 0.669805 15.924749 02:02
6 0.111853 0.083819 0.710055 18.934025 02:04
1 Like

You could also potentially try Ranger optimizer + fit_flat_cos

I’m trying to use RandomErasing() and apply it only on the input images, while it applies it on both input and target.

Like this guy tried to do this here:

I found out that this augmentation is being called twice:

  • once for all of the input images in a batch;
  • and another time for all of the target images of the batch.

I need to make it be called only once, for the input.

Here’s the source code:

# Cell
def cutout_gaussian(x, areas):
    "Replace all `areas` in `x` with N(0,1) noise"
    chan,img_h,img_w = x.shape[-3:]
    for rl,rh,cl,ch in areas: x[...,rl:rh, cl:ch].normal_()
    return x

# Cell
def _slice(area, sz):
    bound = int(round(math.sqrt(area)))
    loc = random.randint(0, max(sz-bound, 0))
    return loc,loc+bound

# Cell
class RandomErasing(RandTransform):
    "Randomly selects a rectangle region in an image and randomizes its pixels."
    order = 100 # After Normalize
    def __init__(self, p=0.5, sl=0., sh=0.3, min_aspect=0.3, max_count=1):
        store_attr()
        super().__init__(p=p)
        self.log_ratio = (math.log(min_aspect), math.log(1/min_aspect))

    def _bounds(self, area, img_h, img_w):
        r_area = random.uniform(self.sl,self.sh) * area
        aspect = math.exp(random.uniform(*self.log_ratio))
        return _slice(r_area*aspect, img_h) + _slice(r_area/aspect, img_w)

    def encodes(self,x:TensorRawImage):
        count = random.randint(1, self.max_count)
        _,img_h,img_w = x.shape[-3:]
        area = img_h*img_w/count
        areas = [self._bounds(area, img_h, img_w) for _ in range(count)]
        return cutout_gaussian(x, areas)

Maybe the solution is to change somewhere the source code to let it apply only on the input. Any idea?

I think the problem is that the transform is being type-dispatched, so since both x and y are images, it is being applied to both. One alternative is to have a separate type for y images so it doesn’t get applied there, but that may be too complicated. I think this solution is probably easier:

Why it works is that it redefines __call__ which usually checks for the type of the data and applies based on the type dispatch, but here will be applied however you define it.

2 Likes

Thanks!
If I use his solution (which I also thought about doing so), but I still want the RandomErasing mechanism, then shall I do something like this?

# Cell
def cutout_gaussian(x, areas):
    "Replace all `areas` in `x` with N(0,1) noise"
    chan,img_h,img_w = x.shape[-3:]
    for rl,rh,cl,ch in areas: x[...,rl:rh, cl:ch].normal_()
    return x

# Cell
def _slice(area, sz):
    bound = int(round(math.sqrt(area)))
    loc = random.randint(0, max(sz-bound, 0))
    return loc,loc+bound

# Cell
class RandomErasing(RandTransform):
    "Randomly selects a rectangle region in an image and randomizes its pixels."
    order = 100 # After Normalize
    def __init__(self, p=0.5, sl=0., sh=0.3, min_aspect=0.3, max_count=1):
        store_attr()
        super().__init__(p=p)
        self.log_ratio = (math.log(min_aspect), math.log(1/min_aspect))

    def _bounds(self, area, img_h, img_w):
        r_area = random.uniform(self.sl,self.sh) * area
        aspect = math.exp(random.uniform(*self.log_ratio))
        return _slice(r_area*aspect, img_h) + _slice(r_area/aspect, img_w)

    def __call__(self, b, **kwargs):
        x,y = b
        count = random.randint(1, self.max_count)
        _,img_h,img_w = x.shape[-3:]
        area = img_h*img_w/count
        areas = [self._bounds(area, img_h, img_w) for _ in range(count)]
        return cutout_gaussian(x, areas), y

Or am I still missing out something critical here because of lack of understanding __call__ or encodes(). In other words, could I give up on either one of these?

1 Like

I think that should work… Try it out and let us know…

Sadly it didn’t work… It seemed like the __call__ wasn’t even called for.

I tried to get smarter with this tutorial:

Where it kinda explains crucial things like how to apply a transform only on the train dataset, but not the valid dataset.

But it doesn’t really explain applying a transform only on the input. :frowning:

I also found this guy’s advice:

Followed what you said:

I think the problem is that the transform is being type-dispatched, so since both x and y are images, it is being applied to both.

Seems that you were right, and the other guy found a way to solve that by improvising on that basis.

I also found this topic:

I think that I got inspired by them for a good solution.
My idea is to add some flag on my TensorRawImage that holds either input or target. The transform later would check out whether it’s input or target, just like it would have checked on this whether it’s a train or valid (with the flat split_idx).

I will post here a nice solution if it works.

Edit: Wow, almost made it. It was very challenging :cold_sweat:
But I got some problem but well, can’t solve this much

Huh that’s interesting, I would have thought overriding __call__ would have worked.

But yeah the other solution does what I mentioned before: having a separate type for y images so it doesn’t get applied there.

That should work, I think I have done it myself in the past too. Let me know how your flag idea goes…

1 Like

The thing with the flag is that I can’t pass it in from the equivalent Image.Image object (Mine is RawObj) to TensorImage (Mine is TensorRawImage).

If I could do so, then inside the RandomErasing() I’d have only added an if clause that checks:

if x.flag=="input":
    < rest of the same code >
    return cutout_gaussian(x)
else:
    return x

I can make PILImage inherits something from Image.Image (as equivalently as with my new objects), but I can’t let TensorImage inherits from PILImage (and accordingly with my objects).

It’s way beyond my knowlodge of PyTorch/Fastai source code, but if there was a way to do so, then I’d easily be then able to pass in that flag from the Image.Image to TesnorImage for each input and target accordingly.

Maybe @Jeremy could enlighten us here… :smiley:

They’re not the same thing. Discrim LR is different LRs for different layers. fit_one_cycle is the 1-cycle scheduler.

Please don’t at-mention me for non-admin things.

That’s what I’d do.

1 Like

That’s what I’d do.

Yeah, it works, although not being the most elegant solution, but hey, it works.

My inner programmer likes to find scalable solutions for further uses (other Transforms, other TensorTypes, less unknown declarations, etc)

Thanks though!

Please don’t at-mention me for non-admin things.

Thanks for letting me know. Only tagged you because my solution could be scalable, but anyway I solved my case.

1 Like

Hi Florian,

Thank you very much for sharing your experience. I agree that using CLIP would probably be a more practical approach to the problem but I was looking for something that could be train in an unsupervised manner on any dataset (not only ImageNet-like). I confirm that self_supervised is a great library – simple and easy to work with. :slight_smile:

I agree with everything you wrote but I am wondering if there may be a better/simpler way. I found two interesting papers that fueled my interest in this:

  1. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric – they show that all the (un-pooled) multi-scale features are very good predictors of perceptual similarity
  2. On the surprising tradeoff between ImageNet accuracy and perceptual similarity – the worse model you train the better it’s perceptual match to humans

I, just like you, tried several pooling methods and did not find any to be much better than the rest. I then tested the localized ResNet18 features from the 14x14x256 layer and they were surprisingly accurate at finding similar parts of other images. That’s why I suspect that the pooling methods do not make any real sense and in fact actively reduce the capability of the network to figure out the image content (unless we “train around” them by fine-tuning the whole network).

I remember Jeremy saying in one of the previous courses that you should strive to make the task “easy to learn” for the network – if calculating element-wise avg or max of the local features does not have sensible semantic meaning then I suspect we are making the task harder, not easier.

PS. I think NetVLAD is based on a similar observation but I have yet to dive deeper into this approach.

I’m wanting to learn more about defaults in the Fast AI library. Ideally how they are set and why they are set.

One topic of interest to me is the optimizer. Reading chapter 16 of the text ( https://colab.research.google.com/github/fastai/fastbook/blob/master/16_accel_sgd.ipynb#scrollTo=W8YIFHdHRGVf) it states: “In fastai, Adam is the default optimizer we use since it allows faster training”

Specific questions:

  • Where in the fast AI library can I see this is the default optimizer?
  • How do I see what other defaults are chosen for us in the fast AI library?

I can see fastai code for the optimizers here: fastai/12_optimizer.ipynb at master · fastai/fastai · GitHub but am not seeing or understand how defaults are setup.

I’m not sure if this will be covered in the 2022 version of the course, so this might not even be the right forum to post this type of question.

So, what I did to try to figure this out was a quick grep for the term adam & Adam as a starting point, that led me to the top level learner.py, and there I found the default value for opt_func to be set to Adam
https://github.com/fastai/fastai/blob/master/fastai/learner.py#L85
If you look at the top of the file, you’ll notice it’s generated from nbs/13a_learner.ipynb, which then you can go browsing further.

Since the learner is an entrypoint for a lot of “configuration” settings, it’s not a bad place to start looking into all the arguments that are pre-defined, those are the default settings for the learner object. And, similarly there’s default settings (usually in form of default arguments value) for other functions/objects as well.

Sometimes the best thing to do is just start searching/grepping in the codebase. (given this is a non-beginner discussion). I can recommend tools such as grep, ag or rg to help with this. Or, anything that’s just built into the editor “Search in project” feature as well.

I hope this was somewhat helpful.

5 Likes

@suvash - Thanks! I was just using the github search feature and wasn’t able to find that on my own.

I personally think it would be good to identify at a high level all default decisions made by a library. Part of the cool benefit of the fastai library is the work of many people over many years identifying good and sensible defaults that work well for a wide range of problems. Knowing those defaults and the reasons behind them can provide knowledge that may transfer to other deep learning problems or libraries.

1 Like

That would certainly be great to have. It would be thousands of hours of work though – so as folks in the community study the code and learn about fastai, it’s very helpful if they write posts about what they learn. There’s already many posts about these topics out there. For instance, @sgugger wrote about why AdamW was chosen as the default here:

3 Likes

Hi Jakub,

you are right, the pooling layers don’t make sense if you want to get the best results. Basically you are losing the spacial information where in the image a feature is located. You still know which features are present in the image but not where. But after pooling you have only a 256 dim vector instead of 50176 dim vector :D.

Did you try to just flatten the Resnet18 features and compute the similarities? I guess that should give pretty good results but won’t scale that well if you have to store a 50176 dim vector for each image. Probably the way to go are Vision transformers. No pooling layers and (depending on the model) a 768 dim output vector. CLIP was trained with Resnet50 and ViT … afaik ViT gave the better results so I’d suspect ViT give better representations when trained correctly (contrastive).

1 Like

Thanks for the link - started reading the post and so far seems to be well written and easy to understand at my current level of knowledge.

I didn’t realize that it would take so much effort to document and explain defaults, but after seeing the AdamW post, the amount of effort required seems very large - in many aspects including development, testing, documenting and sharing knowledge with others.

Each of us becoming advocates sounds like the best way to continue the great fastai work!