Yes, restoration project. And soon to be another in super-resolution as well.
I’ve made some HWs about fit_one_cycle function (as known as the Discriminative Learning Rates method), and also read this:
And decided to give up on fit and ReduceLROnPlateau(). I found out that maybe playing with the pct_start could very improve my learning rate results. Now I’m reading the tutorial about learner to find more inspirational tools.
I think the problem is that the transform is being type-dispatched, so since both x and y are images, it is being applied to both. One alternative is to have a separate type for y images so it doesn’t get applied there, but that may be too complicated. I think this solution is probably easier:
Why it works is that it redefines __call__ which usually checks for the type of the data and applies based on the type dispatch, but here will be applied however you define it.
Thanks!
If I use his solution (which I also thought about doing so), but I still want the RandomErasing mechanism, then shall I do something like this?
# Cell
def cutout_gaussian(x, areas):
"Replace all `areas` in `x` with N(0,1) noise"
chan,img_h,img_w = x.shape[-3:]
for rl,rh,cl,ch in areas: x[...,rl:rh, cl:ch].normal_()
return x
# Cell
def _slice(area, sz):
bound = int(round(math.sqrt(area)))
loc = random.randint(0, max(sz-bound, 0))
return loc,loc+bound
# Cell
class RandomErasing(RandTransform):
"Randomly selects a rectangle region in an image and randomizes its pixels."
order = 100 # After Normalize
def __init__(self, p=0.5, sl=0., sh=0.3, min_aspect=0.3, max_count=1):
store_attr()
super().__init__(p=p)
self.log_ratio = (math.log(min_aspect), math.log(1/min_aspect))
def _bounds(self, area, img_h, img_w):
r_area = random.uniform(self.sl,self.sh) * area
aspect = math.exp(random.uniform(*self.log_ratio))
return _slice(r_area*aspect, img_h) + _slice(r_area/aspect, img_w)
def __call__(self, b, **kwargs):
x,y = b
count = random.randint(1, self.max_count)
_,img_h,img_w = x.shape[-3:]
area = img_h*img_w/count
areas = [self._bounds(area, img_h, img_w) for _ in range(count)]
return cutout_gaussian(x, areas), y
Or am I still missing out something critical here because of lack of understanding __call__ or encodes(). In other words, could I give up on either one of these?
Sadly it didn’t work… It seemed like the __call__ wasn’t even called for.
I tried to get smarter with this tutorial:
Where it kinda explains crucial things like how to apply a transform only on the train dataset, but not the valid dataset.
But it doesn’t really explain applying a transform only on the input.
I also found this guy’s advice:
Followed what you said:
I think the problem is that the transform is being type-dispatched, so since both x and y are images, it is being applied to both.
Seems that you were right, and the other guy found a way to solve that by improvising on that basis.
I also found this topic:
I think that I got inspired by them for a good solution.
My idea is to add some flag on my TensorRawImage that holds either input or target. The transform later would check out whether it’s input or target, just like it would have checked on this whether it’s a train or valid (with the flat split_idx).
I will post here a nice solution if it works.
Edit: Wow, almost made it. It was very challenging
But I got some problem but well, can’t solve this much
The thing with the flag is that I can’t pass it in from the equivalent Image.Image object (Mine is RawObj) to TensorImage (Mine is TensorRawImage).
If I could do so, then inside the RandomErasing() I’d have only added an if clause that checks:
if x.flag=="input":
< rest of the same code >
return cutout_gaussian(x)
else:
return x
I can make PILImage inherits something from Image.Image (as equivalently as with my new objects), but I can’t let TensorImage inherits from PILImage (and accordingly with my objects).
It’s way beyond my knowlodge of PyTorch/Fastai source code, but if there was a way to do so, then I’d easily be then able to pass in that flag from the Image.Image to TesnorImage for each input and target accordingly.
Thank you very much for sharing your experience. I agree that using CLIP would probably be a more practical approach to the problem but I was looking for something that could be train in an unsupervised manner on any dataset (not only ImageNet-like). I confirm that self_supervised is a great library – simple and easy to work with.
I agree with everything you wrote but I am wondering if there may be a better/simpler way. I found two interesting papers that fueled my interest in this:
I, just like you, tried several pooling methods and did not find any to be much better than the rest. I then tested the localized ResNet18 features from the 14x14x256 layer and they were surprisingly accurate at finding similar parts of other images. That’s why I suspect that the pooling methods do not make any real sense and in fact actively reduce the capability of the network to figure out the image content (unless we “train around” them by fine-tuning the whole network).
I remember Jeremy saying in one of the previous courses that you should strive to make the task “easy to learn” for the network – if calculating element-wise avg or max of the local features does not have sensible semantic meaning then I suspect we are making the task harder, not easier.
PS. I think NetVLAD is based on a similar observation but I have yet to dive deeper into this approach.
So, what I did to try to figure this out was a quick grep for the term adam & Adam as a starting point, that led me to the top level learner.py, and there I found the default value for opt_func to be set to Adam https://github.com/fastai/fastai/blob/master/fastai/learner.py#L85
If you look at the top of the file, you’ll notice it’s generated from nbs/13a_learner.ipynb, which then you can go browsing further.
Since the learner is an entrypoint for a lot of “configuration” settings, it’s not a bad place to start looking into all the arguments that are pre-defined, those are the default settings for the learner object. And, similarly there’s default settings (usually in form of default arguments value) for other functions/objects as well.
Sometimes the best thing to do is just start searching/grepping in the codebase. (given this is a non-beginner discussion). I can recommend tools such as grep, ag or rg to help with this. Or, anything that’s just built into the editor “Search in project” feature as well.
@suvash - Thanks! I was just using the github search feature and wasn’t able to find that on my own.
I personally think it would be good to identify at a high level all default decisions made by a library. Part of the cool benefit of the fastai library is the work of many people over many years identifying good and sensible defaults that work well for a wide range of problems. Knowing those defaults and the reasons behind them can provide knowledge that may transfer to other deep learning problems or libraries.
That would certainly be great to have. It would be thousands of hours of work though – so as folks in the community study the code and learn about fastai, it’s very helpful if they write posts about what they learn. There’s already many posts about these topics out there. For instance, @sgugger wrote about why AdamW was chosen as the default here:
you are right, the pooling layers don’t make sense if you want to get the best results. Basically you are losing the spacial information where in the image a feature is located. You still know which features are present in the image but not where. But after pooling you have only a 256 dim vector instead of 50176 dim vector :D.
Did you try to just flatten the Resnet18 features and compute the similarities? I guess that should give pretty good results but won’t scale that well if you have to store a 50176 dim vector for each image. Probably the way to go are Vision transformers. No pooling layers and (depending on the model) a 768 dim output vector. CLIP was trained with Resnet50 and ViT … afaik ViT gave the better results so I’d suspect ViT give better representations when trained correctly (contrastive).
Thanks for the link - started reading the post and so far seems to be well written and easy to understand at my current level of knowledge.
I didn’t realize that it would take so much effort to document and explain defaults, but after seeing the AdamW post, the amount of effort required seems very large - in many aspects including development, testing, documenting and sharing knowledge with others.
Each of us becoming advocates sounds like the best way to continue the great fastai work!