I’ll chime in here a moment and answer partially (and get to the rest too eventually). We are transfer learning, hence pretrained backbone. Then you could also then assume a pretrained front end due to the continuing to run after the size increase Yes, we chose a R34 because unet has special cuts to use for them. If we look at unet_learner we see:
@delegates(Learner.__init__)
def unet_learner(dls, arch, loss_func=None, pretrained=True, cut=None, splitter=None, config=None, n_in=3, n_out=None,
normalize=True, **kwargs):
"Build a unet learner from `dls` and `arch`"
if config is None: config = unet_config()
meta = model_meta.get(arch, _default_meta)
body = create_body(arch, n_in, pretrained, ifnone(cut, meta['cut']))
size = dls.one_batch()[0].shape[-2:]
if n_out is None: n_out = get_c(dls)
assert n_out, "`n_out` is not defined, and could not be infered from data, set `dls.c` or pass `n_out`"
if normalize: _add_norm(dls, meta, pretrained)
model = models.unet.DynamicUnet(body, n_out, size, **config)
learn = Learner(dls, model, loss_func=loss_func, splitter=ifnone(splitter, meta['split']), **kwargs)
if pretrained: learn.freeze()
return learn
So if we can do a create_body on any model, we can use it here (create_body makes an encoder)
@bwarner just to check if what i’m thinking is correct:
We are using BCEWithLogitLoss which is basically BCELoss(sigmoid(raw_scores)) .
So raw predictions can be in any range, after applying sigmoid it is squeezed to [0,1] range, we use this for calculating our BCELoss along with the target (which are a bunch of 0’s and bunch of 1’s only).
After we finish training at inference/test time we will get the raw_scores, we apply the sigmoid activation to these raw_scores to squeeze them in the 0 to 1 range. We can now use threshold as follows: sigmoid(raw_scores) > threshold, then that class is present.
So threshold is being used only for predictions and show, does that mean we choose our threshold on how our test data is performing? (that doesn’t sound right).
Still confused how to choose threshold? @muellerzr could you please shed some light on this
Assuming total dataset size is 32, we set our batch size to 2. Gradients are fully redon/updated every 2 images, and in every one pass in our data 16 total times
But that is not really what I want to achieve with gradient accumulation.
Correct me if I’m wrong, but the idea is to not apply the gradient every 2 images, instead we calculate the gradient 2 images at a time but only apply them after N steps. In this way we can have a batch size of 1 that “feels” like a batch size of N (32 for example). This can have several advantages
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
i think what @Igvaz is asking (and what was asked in the video), you usually don’t want to train with bs=1 (it is unstable, batchnorm doesn’t work etc). So if there were no memory constraints and if you could train with a bs=8 that should be equivalent to training with bs=1 but accumulating the gradients for 8 images.(zeroing the grads only after these 8 images are dealt with 1 by 1).
In the code you shared are we updating gradients after we calculate for one image(as bs=1) or is fastai accumulating the gradients for certain number of images?
something like this-
model.zero_grad() # Reset gradients tensors
for i, (inputs, labels) in enumerate(training_set):
predictions = model(inputs) # Forward pass
loss = loss_function(predictions, labels) # Compute loss function
loss = loss / accumulation_steps # Normalize our loss (if averaged)
loss.backward() # Backward pass
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
optimizer.step() # Now we can do an optimizer step
model.zero_grad() # Reset gradients tensors
In the CrossValidation nb and with the stratified K Fold code
I am running the nb:
and getting the following error:
ValueError: Found input variables with inconsistent numbers of samples: [9025, 7220]
Error makes sense for the following line:
for _, val_idx in kf.split(np.array(train_imgs), train_labels):
uses the entire train_imgs which is 9025 elements long whereas we only have train_labels for
the training set part of the training data which would mean we need to provide
train_imgs[:7220] to that line of cod AND
with that change my nb is running.
Am I doing something screwy or how was this worked around in the nb I downloaded a couple of days back which had the StratifiedKfold code I am using.
One more issue I found in that nb is that while skf is defined the actual use is of kf which is defined earlier in the nb.
skf is skf = StratifiedKFold(n_splits=10, shuffle=True)
while earlier in nb
kf is kf = StratifiedKFold(n_splits=5, shuffle=True)
and it so happens that nb uses kf as below in the CV loop
for _, val_idx in kf.split(np.array(train_imgs), train_labels)
but what we want to use is
for _, val_idx in skf.split(np.array(train_imgs[:7220]), train_labels)
I would love to see the Efficientnet example on image classification. Plus I heard the bigger version of it it is quite hard to train. Any recommendations for Pytorch courses?
I am trying to use the Migrating notebook from fastai2 to try to use some PyTorch code in fastai2.
It seems that everything work until I tried to fit_one_cycle when I got an error about pbar. Disable it does not seem to solve the issue. Here is the code with the error.
Any ideas how to solve the problem? Thanks
P.S.: Since is only kind of related, should I open a new post?