Fastai v2 chat

Yes, it does. But the above only works on a TfmdDL I believe. I didn’t see this behavior with a regular DataLoader.

I did calculate the entire dataset though, see my ‘hack’ here: https://www.kaggle.com/muellerzr/plant-pathology-fastai2-exploration

Also, Jeremy and Sylvain found that in general, the one batch is normally enough. If you look at my kernel you’ll see that the full vs one batch is extremely close

1 Like

This looks great idea! but doesn’t it cause MemoryError as you’re loading whole dataset at once?
I have also worked around to calculate the dataset stats, but I use this snippet for calculations

ds = Datasets(fnames, tfms=Pipeline([PILImage.create, Resize(320), ToTensor]))
dl = TfmdDL(ds, bs=32,after_batch=[IntToFloatTensor],drop_last=True)

mean, std = 0., 0.
for b in tqdm(dl,total=len(dl)):
  mean += b[0].mean((0,2,3))
  std += b[0].std((0,2,3))

mean /= len(dl)
std /= len(dl)

It actually doesn’t (I’ve found so far) :slight_smile: I ran mine with 4g of memory just fine on the entire plant pathology dataset. Took quite a bit of time (a minute or two). However your way works as well I believe

@muellerzr do you have any idea about this?

I am using sigmoid in decodes as workaround for show_batch as matplotlib requires float values to be in [0,1] range and if I don’t scale them as required, matplotlib just clamps the Tensor, which is definitely loss of information.

I agree we should reverse the procedure done in the encodes to undo the effect, but what if I want to look at the image with change made in the encodes (in case of pre-processing) ?

Typically you want optimal metrics so better to look at this.

Thanks for replying, after I posted this I dig deeper and found b[1] is a tuple, b[1][0] had 12 masks of 1st channel and so on.
I tried n_inp and it didn’t help.
and for the stacking, I wanted to keep them separated, even If I wanted it to be stacked I don’t know how to stack PILMasks as a single tensor. :confused:

I’ve worked with image+masks input recently. I’ll post a tutorial soon but this is how I did it

class MultiMask(Tuple):
  ...

  def stack(self):
    # To be used with batch only
    return L(self).stack(dim=1)

To make the stack work for you, you need all of your masks be of same size and maybe square, I’m not sure about the square though. So in case, the stack fails, just put Resize(<desired size>) before calling ToTensor as you might be customizing ToTensor for you:

@ToTensor
def encodes(self, o:MultiMask): return TensorMask(o.stack())

You can simply pass in a list of PILMasks to this tuple, so your array_to_mask could be modified like so:

@Transform
def array_to_mask(x):
    return MutliMask([PILMask.create(o) for o in x[:4]])

This will give you a single TensorMask object with all your masks, expecting the dim to be (4,size,size).

You won’t be able to show_batch anymore, as TensorMask isn’t designed to show multiple masks at once. This requires you to @typedispatch the show_batch for your use-case. I’ll share a tutorial soon explaining the whole process.

1 Like

Does this apply to Singular Value Decomposition as well? covariance matrix of one batch vs full dataset?

EDIT: I suppose it’s not the covariance matrix that make difference, it’s their decomposition.

That may be better addressed by @sgugger (or he can @ Jeremy :slight_smile: ) as I don’t know

Hi all

Apologies if this info is available somewhere already. Does anyone know the current state of pretrained xresnets? Back in January, it seems that they weren’t yet ready. I’m seeing better performance with old pretrained resnets so I’m assuming this is still the case?

Cheers!

Currently only the xresnet50 is pretrained

Ah okay. I’ve been using resnet18 since it was mentioned (i think by you!) that it works better for keypoint regression.

It does, or atleast it gets the job done :slight_smile: I’ve been just training the model from scratch with ranger and fit_flat_cos and it works pretty well :slight_smile:

1 Like

I might give that a go too

@muellerzr can you find what’s wrong here, am I doing anything dumb to get this behaviour?

You haven’t given read access to your notebook so no one can really help you. Also, I wouldn’t want to be @-ing the admins that way if I were you :wink:, it just makes it less likely to get a reply from them. Grant read access to your notebook.

Cheers,
Tendo

I don’t mind the @ at myself, but yes we would like read/write access :slight_smile: (I’ll try look this weekend @vijayabhaskar)

2 Likes

Sorry, I thought just clicking the share allows people to view the content, here is the updated link

I don’t usually @ mention admins, I believe this might be a bug in fastai2 (if I’m not doing anything dumb) so I thought Sylvian should look into it.

Thanks!

Asking this question here which was previously posted in this topic

how can I customize the batch sampling method? In metric learning approaches, we need some control over the no. of positive/negative examples in a batch, where can I define this logic ?

Did you have a look here: https://dev.fast.ai/callback.data ?

There are no docs yet but maybe something like WeightedDL is what you’re looking for?

1 Like