A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

I’ll chime in here a moment and answer partially (and get to the rest too eventually). We are transfer learning, hence pretrained backbone. Then you could also then assume a pretrained front end due to the continuing to run after the size increase :slight_smile: Yes, we chose a R34 because unet has special cuts to use for them. If we look at unet_learner we see:

@delegates(Learner.__init__)
def unet_learner(dls, arch, loss_func=None, pretrained=True, cut=None, splitter=None, config=None, n_in=3, n_out=None,
                 normalize=True, **kwargs):
    "Build a unet learner from `dls` and `arch`"
    if config is None: config = unet_config()
    meta = model_meta.get(arch, _default_meta)
    body = create_body(arch, n_in, pretrained, ifnone(cut, meta['cut']))
    size = dls.one_batch()[0].shape[-2:]
    if n_out is None: n_out = get_c(dls)
    assert n_out, "`n_out` is not defined, and could not be infered from data, set `dls.c` or pass `n_out`"
    if normalize: _add_norm(dls, meta, pretrained)
    model = models.unet.DynamicUnet(body, n_out, size, **config)
    learn = Learner(dls, model, loss_func=loss_func, splitter=ifnone(splitter, meta['split']), **kwargs)
    if pretrained: learn.freeze()
    return learn

So if we can do a create_body on any model, we can use it here (create_body makes an encoder)

2 Likes
  1. Yes, the decoder
  2. Decoder does get the same unless we pass in a slice

@foobar8675 exactly what @barnacl. Our’s anneals at that 72% threshold (75% is default) Which is 72% of the total batches, not epochs!

It’s updated on our minibatches (so the batch size)

@muellerzr do we still use layer_groups ? couldn’t find it in the docs

@bwarner just to check if what i’m thinking is correct:
We are using BCEWithLogitLoss which is basically BCELoss(sigmoid(raw_scores)) .
So raw predictions can be in any range, after applying sigmoid it is squeezed to [0,1] range, we use this for calculating our BCELoss along with the target (which are a bunch of 0’s and bunch of 1’s only).
After we finish training at inference/test time we will get the raw_scores, we apply the sigmoid activation to these raw_scores to squeeze them in the 0 to 1 range. We can now use threshold as follows: sigmoid(raw_scores) > threshold, then that class is present.
So threshold is being used only for predictions and show, does that mean we choose our threshold on how our test data is performing? (that doesn’t sound right).
Still confused how to choose threshold?
@muellerzr could you please shed some light on this

for picking the threshold, i dont believe there’s a science to it. i think you have to see how the data is performing like u said. for multilabel classification in v1 last year, it was set at .2 https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-planet.ipynb .

1 Like

Sorry sorry, still confused, an example should make it easier to understand.

How would I achieve the following: Load 2 images at a time, update between 32 images

Assuming total dataset size is 32, we set our batch size to 2. Gradients are fully redon/updated every 2 images, and in every one pass in our data 16 total times :slight_smile:

1 Like

But that is not really what I want to achieve with gradient accumulation.

Correct me if I’m wrong, but the idea is to not apply the gradient every 2 images, instead we calculate the gradient 2 images at a time but only apply them after N steps. In this way we can have a batch size of 1 that “feels” like a batch size of N (32 for example). This can have several advantages

Yes, we step the gradients slowly and then once we hit the end we accumulate them all and zero

@lgvaz not sure if this helps, but if u look at this example https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py step #4, the bs is set to 4, so there are 4 inputs which go through the forward and backward pass right after the gradients are zeroed out.

    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
1 Like

i think what @Igvaz is asking (and what was asked in the video), you usually don’t want to train with bs=1 (it is unstable, batchnorm doesn’t work etc). So if there were no memory constraints and if you could train with a bs=8 that should be equivalent to training with bs=1 but accumulating the gradients for 8 images.(zeroing the grads only after these 8 images are dealt with 1 by 1).
In the code you shared are we updating gradients after we calculate for one image(as bs=1) or is fastai accumulating the gradients for certain number of images?
something like this-

model.zero_grad()                                   # Reset gradients tensors
for i, (inputs, labels) in enumerate(training_set):
    predictions = model(inputs)                     # Forward pass
    loss = loss_function(predictions, labels)       # Compute loss function
    loss = loss / accumulation_steps                # Normalize our loss (if averaged)
    loss.backward()                                 # Backward pass
    if (i+1) % accumulation_steps == 0:             # Wait for several backward steps
        optimizer.step()                            # Now we can do an optimizer step
        model.zero_grad()                           # Reset gradients tensors
1 Like

Only one. And yes, this is due to memory constraints (due to the fact our image goes full size)

1 Like

In the CrossValidation nb and with the stratified K Fold code
I am running the nb:
and getting the following error:

ValueError: Found input variables with inconsistent numbers of samples: [9025, 7220]

Error makes sense for the following line:
for _, val_idx in kf.split(np.array(train_imgs), train_labels):
uses the entire train_imgs which is 9025 elements long whereas we only have train_labels for
the training set part of the training data which would mean we need to provide
train_imgs[:7220] to that line of cod AND
with that change my nb is running.

Am I doing something screwy or how was this worked around in the nb I downloaded a couple of days back which had the StratifiedKfold code I am using.

One more issue I found in that nb is that while skf is defined the actual use is of kf which is defined earlier in the nb.
skf is skf = StratifiedKFold(n_splits=10, shuffle=True)
while earlier in nb
kf is kf = StratifiedKFold(n_splits=5, shuffle=True)

and it so happens that nb uses kf as below in the CV loop
for _, val_idx in kf.split(np.array(train_imgs), train_labels)
but what we want to use is
for _, val_idx in skf.split(np.array(train_imgs[:7220]), train_labels)

to get 10 splits rather than 5 splits.

@Srinivas originally it was regular KFold and then someone implemented Stratified. I must not have updated the whole thing in the function.

I would love to see the Efficientnet example on image classification. Plus I heard the bigger version of it it is quite hard to train. Any recommendations for Pytorch courses?

1 Like

Hi,

I am trying to use the Migrating notebook from fastai2 to try to use some PyTorch code in fastai2.

It seems that everything work until I tried to fit_one_cycle when I got an error about pbar. Disable it does not seem to solve the issue. Here is the code with the error.

Any ideas how to solve the problem? Thanks

P.S.: Since is only kind of related, should I open a new post?

@Joan did you install the most recent version of fastprogress?

I think so:

fastprogress.__version__
‘0.2.2’

Unsure, I think this would be better as a separate forum post :slight_smile:

1 Like