A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

If I’m creating a manual list of transforms via a Pipeline. how to I add cuda to that? I tried doing lambda x: x.cuda() as one of the transforms, but this this gets applied to PILImage and an error is throw, I think this has to do with the order of the transforms

EDIT: I also tried to put TensorImage into the Pipeline by doing TensorImage.new with no success

@lgvaz what is your pipeline defined as? Can’t do much without code :wink: do you call ToTensor()

This is my current pipeline:

pipe = Pipeline([PILImage.create, ToTensor(), IntToFloatTensor(), Normalize.from_stats(*imagenet_stats, cuda=False)])

I want to add TensorImage and Cuda to that

Try looking here: https://dev.fast.ai/core.transform.html#Pipeline

May answer a few questions. Not for CUDA though (though it may that too)

1 Like

Does anybody know a good reason why we using in the tutorial notebook the non-stratified K-split version?

from sklearn.model_selection import KFold

Would it not be better to use stratified splits to make sure that all classes are represented in the training and validation set? e.g. in case of imbalanced datasets we may have the problem that one of the minority classes does not appear in the training examples and this would then raise an error.

1 Like

Basically I was only able to get the regular KFold working @mgloria. If anyone can figure out how to get Stratified working instead that would be great :wink:

1 Like

I want to catch up with your walk with fastai2 but I also get the “cannot import name 'PILLOW_VERSION' from 'PIL'” error with torch 1.3.1. and pillow 7.0.0.

Is there another workaround for that problem (I didn’t find via the forum search)?

You need Pillow 6.2.1, as the directions state above. Basically they changed how to grab the pillow version between those two.

1 Like

See here: https://github.com/python-pillow/Pillow/issues/4130

PILLOW_VERSION has been removed. Use __version__ instead.

https://pillow.readthedocs.io/en/stable/releasenotes/7.0.0.html#pillow-version-constant

So you can modify the torchvision file to use __version__ instead of PILLOW_VERSION, or you can downgrade PIL

2 Likes

Thank you, I was just rereading the posts in detail and I was able to solve it in my conda env with conda install Pillow==6.2.1.

1 Like

Yes some things do get buried under this thread sadly :frowning: I can’t think of a good way to go about doing that for the big things. I know you can do a summary of the thread that goes by likes but that’s about it.

1 Like

We could add the note to the top? However, maybe that changes soon and is not needed anymore. People are super helpful here if this gets asked again, so we will solve it anyway. :slight_smile:

Thank you for the fast help, @muellerzr & @morgan !

2 Likes

I tried a bit on my own now but I am also getting into errors. Nevertheless, I am sharing since I believe this brings it one important step further:

@mgloria your x’s should be a list of indexes. Perhaps this may help you, this was my v1 implementation

https://github.com/muellerzr/fastai-Experiments-and-tips/blob/master/K-Fold%20Cross%20Validation/kfold.ipynb

Maybe try len(train_imgs) instead of train_imgs?

I will look into it! A little trick for the others. To open super fast a github ipynb in colab do the following:

e.g. https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Computer%20Vision/01_Custom.ipynb

becomes: https://colab.research.google.com/github/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Computer%20Vision/01_Custom.ipynb

So you basically only need to remove https://github.com/ and add instead https://colab.research.google.com/github/

10 Likes

I had no idea you could do that! Awesome! (I’ll update the links in the first post unless someone beats me :wink: )

Edit: all notebook links in the first post point to Colab :slight_smile: Thanks @mgloria!

It is my understanding that the list of indexes is precisely what it is generated (i.e. which images belong to training / validation set), see docu. I believe it is solved, check this out @muellerzr :wink:

split_list = [L(range(len(train_imgs))), L(range(len(train_imgs), len(train_imgs)+len(tst_imgs)))]

dsrc = Datasets(train_imgs+tst_imgs, tfms=[[PILImage.create], [parent_label, Categorize]],
            splits = split_list)

train_labels = L()
for i in range(len(dsrc.train)):
train_labels.append(dsrc.train[i][1])

skf = StratifiedKFold(n_splits=5)
print(skf.get_n_splits(np.array(train_imgs), train_labels))
for train_index, valid_index in skf.split(np.array(train_imgs), train_labels):
  print("TRAIN:", len(train_index), "VALID:", len(valid_index))
  print("TRAIN:", train_index, "VALID:", valid_index)

These are the outputs (easier to get the idea):

  • PS I suggest also changing the wording “test” to “valid or dev” set to avoid confusion when doing K-fold in the notebooks
2 Likes

That’s fantastic @mgloria! Well done! I’ll definitely change to stratified whenever I can. And on the valid/test too.

Hi @muellerzr I’m trying to use fastai2 on https://www.kaggle.com/c/understanding_cloud_organization, it has 4 segmentaion masks, I’m guessing I would need a 4 output datablock, but I don’t know how to achieve that. How to create datablocks with multiple number of inputs and outputs in general?

Look at the Object Detection notebook for an example. It has two outputs.

1 Like