This is a bit of a general question. I understand that when we use the high level DataBlock, ToTensor is being applied automatically, but if I look at what it is it’s simply:

class ToTensor(Transform):
    "Convert item to appropriate tensor class"
    order = 5

I then went in and tried to look at Transform to see if I could find anywhere where it’s explicitly done but I couldn’t figure out where the magic is happening. Any hints to what I’m missing?

Take a look at this, where the ToTensor is defined for images

Aha! That’s what I was missing. It simply has the Transform base class and then the encodes functionalities are assigned in the submodules. That makes perfect sense. Thanks @scart97 :slight_smile:

(Along with the tensor classes being assigned to each Pillow type)

I want to clarify something before I make a PR, does passing get_x and/or get_y to DataBlock only works when n_inp=1?

I’m asking this because of the following lines of code:

if self.get_x: self.getters[0] = self.get_x
if self.get_y: self.getters[1] = self.get_y

Now, let’s say I have 2 inputs and 1 output. If I pass get_y it will be wrongly assigned to the second input.

I think we should do something like:

if self.get_x: getters[:n_inp] = self.get_x
if self.get_y: getters[n_inp:] = self.get_y

We also need to check that get_x is a list with length 2 in this case.


Yes, for now, get_x/get_y only works for one input and one target. In other cases, you are supposed to provide getters.
But I like your approach! Feel free to suggest a PR with it!

Me too.

Is there already a loss function that can be used with a multi-target classification? In my Dataloaders I end up with a collated tensor of one hot encoded vectors but after that the Learner fails on the loss calculation (I tried LabelSmoothingCrossEntropy and CrossEntropyLossFlat). I went through the notebooks but cannot find any example that would cover that (only the planet dataset but it focuses on how to prepare the data). Thanks for your help!

That’s because if you choose the “MultiCategoryBlock” the proper loss function is already assigned (if using cnn_learner), which is BCELossLogitsFlat


Ah ok, thank you. I need to dig deeper into that bit.

I’m not sure what does PointScaler do with TensorBBox?

def encodes(self, x:TensorBBox):
    pnts = self.encodes(cast(x.view(-1,2), TensorPoint))
    return cast(pnts.view(-1, 4), TensorBBox)

def decodes(self, x:TensorBBox):
    pnts = self.decodes(cast(x.view(-1,2), TensorPoint))
    return cast(pnts.view(-1, 4), TensorBBox)

I see the method calling encodes on self, but there’s no encodes implementation in TensorBBox and TensorPoint

EDIT: I found scale_pnts being called for TensorPoint but not getting what it’s doing. Any reference to learn about this scaling?

I’m trying to understand what it does to TensorBBox, the code works with Pipeline but not with Datasets

# PointScaler expects `img_size` in _meta
class AddImsize(Transform):
  def __init__(self,sz=128):
  def encodes(self, x:TensorBBox): 
    x._meta = {'img_size':}
    return x    
# This works
p = Pipeline([img2bbox.__getitem__, TensorBBox.create, AddImsize, PointScaler]); p(imgs[0])

# But this does not
itfms = [lambda o: path/'train'/o, PILImage.create]
bbtfms = [img2bbox.__getitem__, TensorBBox.create, AddImsize, PointScaler]
ds = Datasets(imgs,[itfms,bbtfms]) 

Causes AttributeError: do_item

Here’s the complete stack trace

Interesting note: when I pass in Pipelines of transform, surprisingly it works

p1 = Pipeline(itfms)
p2 = Pipeline(bbtfms)
ds = Datasets(imgs,[p1,p2])

This transform is not meant to be used on its own in a Datasets, but at the batch level as after_item. It needs the tuple (image, point) to work.

I’m trying to create a TfmdDL subclass, but I need to be able to pass a new collate_fn. Is there any way to do that in fastai2 similar to how it was done in fastai?

If your collate function is just an operation on samples (like padding) you should pass it to before_batch (expects an array of samples and should return the modified array).
If it’s a very custom collation in itself, it’s the function create_batch you want to modify (defaults to fa_collate of fa_convert depending of if your data loader has a batch size or not)

Thanks. I’ve opened a PR for this here

I’m trying to run TTA on test set, but it is using validation set instead of the test set.
get_preds works fine though.
I’m using current fastai2 github version.
Has anyone experienced this?

tst_dl = dls.test_dl([[0]/x for x in testdf.Image.values])

preds[0].shape #==> torch.Size([600, 4])

preds[0].shape #==> torch.Size([3219, 4])

Ah yes, it was using the ds_idx internally and not the dl. Should be fixed now.


I’ve noticed that get_image_files() from data.transforms reads not all the image files.
Investigation showed that image_extensions is not initialized with all possible mimetypes.

Fix: add mimetypes.init()

image_extensions = set(k for k,v in mimetypes.types_map.items() if v.startswith('image/'))

See what difference this makes.