This is a wiki post - feel free to edit to add links from the lesson or other useful info.
Lesson 19 and 20 were awesome! I’m still going through the notebooks but really enjoyed both lessons. I especially enjoyed JohnO’s experiments and how he stretched the tools in different ways than I’ve seen them used before.
On my computer, running
torch.rand_like(content_im) produces the following result:
Running it again produces the following result:
and so on for all subsequent iterations (each sub-image gets divided into nine sub-images).
To solve this, I had to replace
torch.rand(*content_im.shape). I thought that others might also find this helpful.
Also, when I do the style transfer starting from the content image, I get the following result:
I don’t know how to fix this; it could be an issue with randomisation again, but I have no idea where. I’m getting ridiculously large loss values, though: 105,876,664,807,674,273,666,695,168 for the training pass, and 105,876,766,264,766,679,069,229,056 for the evaluation pass.
This is strange. I did not have this issue. Could you share a full set of code to reproduce this cutting out the unnecessary/unrelated parts.
Is this happening when running the ‘16A_StyleTransfer’ notebook from the course repo without modifications? I ran that notebook without modifications and did not have the issue you’re describing. Can you share your notebook for us to run through and help debug?
Thanks for your reply. Here is a stripped-down version of the notebook, which contains only relevant code (as far as I can tell): Transfer - Dropbox.
I suspect the problem may have to do with the fact that the code is running on an M1 Mac.
Will there be a live lecture today?
I downloaded your notebook and ran it on my machine and everything worked as expected - the results were the same as what I was getting on my other notebooks, not the weird/poor results you were seeings. It’s possible that it’s an issue with the M1/Pytorch combo acting funky. It’s probably worth trying to pull down an updated copy of the repo and also updating all of the relevant packages to the latest versions to see if that helps.
I just wanted to say thank you to Jeremy, Tanishq, and Jonathan for the “extra” videos they produced here over zoom. I’m still making my way through the video for lesson 20. I was actually quite familiar already with the ideas of using the pre-trained CNNs for feature extraction. But new to me was this clever (yet simple) idea of training the model so that the input image features would learn to be more like the extracted features of the target image (style transfer). That is so neat! It’s cool learning little tricks like this because it might open similar or new ideas for other problems we work on.
I see the same - are you running on an M1? (just saw the other response and see that you are). This error “Clipping input data to the valid range for imshow with RGB data ([0…1] for floats or [0…255] for integers)” makes it sound like it is running away somewhere. Then the 3x3 grid of images looks like it maybe has the image channels and dimensions somehow switch.
The rows in the 3x3 grid the M1 produces using torch.rand_like look to be the colour channels:
Aha, maybe there’s an issue with images with values between 0 and 255 being erroneously converted into float arrays! Will have to check this.
Hi, it seems we have an easy way to speed up the training, 2x for small batches on fp32. The DDPM_v2 went down from 10 min (on 2080it) to less than 5 min without changing the batch size.
Apparently for small batches MetricsCB and ProgressBarCB are the bottlenecks, as they are synchronising the gpu with cpu (using to_cpu on each batch). If we make them lazy and read the metrics from the card only after full epoch we allow the batches of data to be prepared in parallel to the model execution.
The speed up is not that large for fp16 as the batch is large enough. But still it goes down from 160s to 111s.
I’ve tried to improve the training even more using cuda graphs, they are unfortunately a bit unstable and give almost no improvements.
Here is DDPM_v2 that shows both the original code with execution timing added as well the improved lazy callbacks.
This is a bug in pytorch on mps device. Pytorch is not handling correctly tensors that were permuted and then put on mps device. Permute is done to transform regular HWC shape to CHW. If put
download_image(url), it will fix the notebook.
Permute on mps has quite a history if you look at pytorch issues, it explains why you see the image multiple times with colours distorted. We haven’t noticed the issue before as fashion mist has only one channel so permute is a noop. This breaks on Pytorch 1.13.1 and on nightly it crashes.
I’ve created a pytorch issue that summaries what we see: zeros_like / rand_like / randn_like fails on MPS for tensors loaded with pil / torchvision (caused by `.permute`) · Issue #94190 · pytorch/pytorch · GitHub
There is another issue on MPS, ReLU(inplace=True) does not work. The activation are unchanged after the layer. This breaks most of the models from timm library. Here is a relevant pytorch issue. Fortunately the fix is easy, just transform vgg16 like this:
def inplace_false(m): m.inplace = False vgg16.apply(inplace_false)
A content loss class with a method for feature calculation using hooks.
def register_feature(hook, mod, inp, outp): hook.feature = outp class ContentLossToTargetWithHooks(): def __init__(self, feat_model, target_im, target_layers=(18, 25)): fc.store_attr() self.feat_modules = [feat_model[layer] for layer in target_layers] self.target_features = self.get_features(target_im) def get_features(self, image, init=True): with Hooks(self.feat_modules, register_feature) as hooks: f = torch.no_grad() if init else fc.noop f(self.feat_model)(normalize(image)) return [h.feature for h in hooks] def __call__(self, input_im): self.input_features = self.get_features(input_im, init=False) loss = sum((f1-f2).pow(2).mean() for f1, f2 in zip(self.input_features, self.target_features)) return loss loss_function_perceptual = ContentLossToTargetWithHooks( vgg16, content_im, target_layers=(1, 6, 18))
A class that can be used for content and style loss, and the functions for calculating features or grams defined outside.
def register_feature(hook, mod, inp, outp): hook.feature = outp def get_features(feat_modules, image, init=True): with Hooks(feat_modules, register_feature) as hooks: f = torch.no_grad() if init else fc.noop f(feat_model)(normalize(image)) return [h.feature for h in hooks] def get_grams(feat_modules, image, init=True): return L(torch.einsum('chw, dhw -> cd', x, x) / (x.shape[-2]*x.shape[-1]) # 'bchw, bdhw -> bcd' if batched for x in get_features(feat_modules, image, init=init)) class LossToTargetWithHooks: def __init__(self, feat_model, image, calc_func=get_features, target_layers=(18, 25)): fc.store_attr() self.feat_modules = [feat_model[layer] for layer in target_layers] self.target_values = self.calc_func(self.feat_modules, image) def __call__(self, input_im): self.input_values = self.calc_func( self.feat_modules, input_im, init=False) loss = sum((f1-f2).pow(2).mean() for f1, f2 in zip(self.input_values, self.target_values)) return loss style_loss = LossToTargetWithHooks( vgg16, style_im, calc_func=get_grams, target_layers=(1, 6, 11, 18, 25)) content_loss = LossToTargetWithHooks( vgg16, content_im, target_layers=(1, 6, 25))
@fmussari I worked on this as well - and cam up with a similar solution - well basically.
The only change really was that I used
instead of what you do with
f = torch.no_grad() if init else fc.noop
I am getting a runtime error though:
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.
But not, if I approach it like so:
def extract_features(self, image, grad=True): with torch.set_grad_enabled(grad): with Hooks(self.feature_layers, self.hook_func) as hooks: _ = self.model(self.norm_func(image)) return [each.outp for each in hooks]
where I make extract_features part of the class.
I know this error is not uncommon, but in this case I am wondering exactly what is happening here.
Maybe you or someone else knows!
Link to the feature visualization page from class