Fastai v2 chat

rolfe1 · March 17, 2020, 12:17pm

On Lesson-3-camvid under “Go big” I get an error:
NameError: name ‘learn’ is not defined

I have installed fastai2. I use the course that came with fastai2. I use 2 ubuntu computers at home. Same error on both.
Suggestions?

Pablo · March 17, 2020, 3:56pm

I’m not sure… I have thought a bit about it, but nothing seems quite right. Because if many options are possible at the same time, how can you determine which has been confused with which?

We could, perhaps, look at relevant correlations. Like: when this label was not predicted (and should have been) this other label was very likely to have been predicted (and it should have not). This would be a likely confusion between those two labels, but just “likely”.

What do you think? Are you aware of any published work in this direction?

muellerzr · March 17, 2020, 3:58pm

The simple solution is to make them ordered, which I can do on the back end (for instance make them alphabetical)

Perhaps something like labels_missing possibly?

Edit: that could work, restructure to where you shoot for the labels in which are present, and then you go for the n_missing off of those present labels, so if n_missing is 1 and you passed in say cloud, stream, and hill, it would look at options where all three are present, and look for when it just missed one of them, or perhaps it looks at even every combination of them (cloud with stream, etc)

Pablo · March 17, 2020, 4:10pm

What about this. This is inspired by some of our real-case problems, where we may have many labels, but they tend to come in groups.

Imagine “forest + river” and “city + road”. In this type of problem it would be super interesting to find cases (both correct and incorrect) that break the pattern (“forest + road”, “city + river”, “city + forest”).

This would be like translating our problem into a single-label multi-class problem by doing some clustering on the combinations of labels. So instead of looking at individual labels, we look at the most confused types of examples (“cluster 3 is often confused with examples from cluster 7”).

Makes sense?

muellerzr · March 17, 2020, 4:13pm

That’s more or less the thought process I was starting to get on, but you made it very clear for me there I’ll see if I can whip up something today and it should follow this ideology

Pablo · March 17, 2020, 4:16pm

I can’t wait to see where this goes! Thanks for your fantastic work

MicPie · March 18, 2020, 4:46pm

I reinstalled my fastai2 env in conda and I get some errors when I run nbdev_test_nbs.

Is this just me/my machine or is this just a thing that is not fixed yet due to the fast pace of the development?

I can’t wait to start playing with fastai2 in detail.

boris · March 19, 2020, 5:37pm

make sure you have also the latest version of fastcore (git editable install)

boris · March 19, 2020, 5:45pm

I use a lot of custom transforms and typically create my dataloaders this way:
items -> Datasets with tfms and splits (or TfmdLists when I create input/output through one tfm) -> Dataloaders with after_batch

I notice the creation of Datasets is a bit slow (before I even feed anything to the learner) and takes a few minutes (granted I have about 2M images).

Is there any advantage of trying to go through Datablocks (speed wise or memory wise) or should it be pretty much equivalent?

sgugger · March 19, 2020, 5:57pm

It should be exactly the same. The data block API only uses the mid-level behind the scenes.

boris · March 19, 2020, 9:33pm

I am a bit confused in the losses used for wgan.

For example, the generator loss is _tk_mean = fake_pred.mean(). fake_pred is 0 when the critic thinks it’s fake and 1 when it thinks it’s real. If we minimize the loss, it means that the critic thinks the image generated is fake while we want it to think it’s real. I would have actually opposed the losses.

I understand better the definition of gan_loss_from_func where we want fake_preds to be ones for the generator.

Am I missing something?

sgugger · March 19, 2020, 9:50pm

No one said the critic says 0 is fake and is real in this particular case It’s just a mental construct you may have (there is no 0 and 1 targets here).

Since the critic loss is real_pred-fake_pred, the critic does work against the generator, which is all that matters.

boris · March 19, 2020, 9:56pm

Oh you’re right! That makes total sense now!
I’m pre-training generator and critic so I need to pay attention to the target of the critic so it matches what will happen during GAN training.

boris · March 21, 2020, 11:17pm

I’m working on adding gradient accumulation. The idea is:

we calculate the loss (existing callback)
at after_loss, we accumulate the loss in another callback GradientAccumulation and do a self.smooth_loss.reset()
after a certain number of batches, at after_loss, we set smooth_loss to the accumulated loss and reset our loss from GradientAccumulation to 0.

Does it seem like a reasonable approach?

sgugger · March 22, 2020, 1:53pm

I don’t see any reason to touch smooth loss (it is detached from the loss so won’t do anything). You just need to skip the step n times by raising a CancelStepException.

boris · March 22, 2020, 4:56pm

Actually I should have mentioned self.loss instead of self.smooth_loss.
The reason I wanted to do it this way was so that it’s compatible with AdaptiveGANSwitcher which switches based on the loss value.

Your idea is better as I was struggling to find a way to make it compatible with fp16.

For the gan switchers, I guess we can just check for something like self.gradient_accumulation.done before using switch.

boris · March 22, 2020, 7:26pm

So I’ve been accumulating the loss over several batches but it ran into CUDA memory overflow after a certain number of batches.

The 2nd approach was to actually accumulate the gradients (instead of the loss) before updating the weights and it works! I’ll propose a PR.

sgugger · March 22, 2020, 7:55pm

Ah yes, you need to do the backward passes, otherwise you’re not saving anything.

rsomani95 · March 23, 2020, 4:52am

I’ve replicated the callback BnFreeze for v2 as follows:

from fastai2.callback.all import *

bn_types = (nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d)

def set_bn_eval(m:nn.Module)->None:
    "Set bn layers in eval mode for all recursive children of `m`."
    for l in m.children():
        if isinstance(l, bn_types) and not next(l.parameters()).requires_grad:
            l.eval()
        set_bn_eval(l)

class BnFreeze(Callback):
    "Freeze moving average statistics in all non-trainable batchnorm layers."
    def begin_epoch(self):
        set_bn_eval(self.model)

I’ve tested and compared two different frozen models trained on the same dataset, and there’s no mismatch in the batchnorm statistics, so I’m sure this works.

I’m a bit confused as to how exactly I should go about making a PR. Should I create a new notebook. If yes, then how should I number it?

vijayabhaskar · March 23, 2020, 1:16pm

I couldn’t set nrows and ncols in show_images function, I get this error TypeError: subplots() got multiple values for argument 'nrows'

Is this the correct way to use the function?
show_images(img_l,nrows=5,ncols=4)

img_l has the list of images created with PILImage.create.