A walk with fastai2 - Vision - Study Group and Online Lectures Megathread

lgvaz · February 2, 2020, 2:24pm

If I’m creating a manual list of transforms via a Pipeline. how to I add cuda to that? I tried doing lambda x: x.cuda() as one of the transforms, but this this gets applied to PILImage and an error is throw, I think this has to do with the order of the transforms

EDIT: I also tried to put TensorImage into the Pipeline by doing TensorImage.new with no success

muellerzr · February 2, 2020, 2:28pm

@lgvaz what is your pipeline defined as? Can’t do much without code do you call ToTensor()

lgvaz · February 2, 2020, 2:29pm

This is my current pipeline:

pipe = Pipeline([PILImage.create, ToTensor(), IntToFloatTensor(), Normalize.from_stats(*imagenet_stats, cuda=False)])

I want to add TensorImage and Cuda to that

muellerzr · February 2, 2020, 2:36pm

Try looking here: https://dev.fast.ai/core.transform.html#Pipeline

May answer a few questions. Not for CUDA though (though it may that too)

mgloria · February 2, 2020, 2:41pm

Does anybody know a good reason why we using in the tutorial notebook the non-stratified K-split version?

from sklearn.model_selection import KFold

Would it not be better to use stratified splits to make sure that all classes are represented in the training and validation set? e.g. in case of imbalanced datasets we may have the problem that one of the minority classes does not appear in the training examples and this would then raise an error.

muellerzr · February 2, 2020, 2:42pm

Basically I was only able to get the regular KFold working @mgloria. If anyone can figure out how to get Stratified working instead that would be great

MicPie · February 2, 2020, 3:05pm

I want to catch up with your walk with fastai2 but I also get the “cannot import name 'PILLOW_VERSION' from 'PIL'” error with torch 1.3.1. and pillow 7.0.0.

Is there another workaround for that problem (I didn’t find via the forum search)?

muellerzr · February 2, 2020, 3:12pm

You need Pillow 6.2.1, as the directions state above. Basically they changed how to grab the pillow version between those two.

morgan · February 2, 2020, 3:12pm

See here: https://github.com/python-pillow/Pillow/issues/4130

PILLOW_VERSION has been removed. Use __version__ instead.

https://pillow.readthedocs.io/en/stable/releasenotes/7.0.0.html#pillow-version-constant

So you can modify the torchvision file to use __version__ instead of PILLOW_VERSION, or you can downgrade PIL

MicPie · February 2, 2020, 3:13pm

Thank you, I was just rereading the posts in detail and I was able to solve it in my conda env with conda install Pillow==6.2.1.

muellerzr · February 2, 2020, 3:17pm

Yes some things do get buried under this thread sadly I can’t think of a good way to go about doing that for the big things. I know you can do a summary of the thread that goes by likes but that’s about it.

MicPie · February 2, 2020, 3:28pm

We could add the note to the top? However, maybe that changes soon and is not needed anymore. People are super helpful here if this gets asked again, so we will solve it anyway.

Thank you for the fast help, @muellerzr & @morgan !

mgloria · February 2, 2020, 3:44pm

I tried a bit on my own now but I am also getting into errors. Nevertheless, I am sharing since I believe this brings it one important step further:

muellerzr · February 2, 2020, 3:49pm

@mgloria your x’s should be a list of indexes. Perhaps this may help you, this was my v1 implementation

https://github.com/muellerzr/fastai-Experiments-and-tips/blob/master/K-Fold%20Cross%20Validation/kfold.ipynb

Maybe try len(train_imgs) instead of train_imgs?

mgloria · February 2, 2020, 3:59pm

I will look into it! A little trick for the others. To open super fast a github ipynb in colab do the following:

e.g. https://github.com/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Computer%20Vision/01_Custom.ipynb

becomes: https://colab.research.google.com/github/muellerzr/Practical-Deep-Learning-for-Coders-2.0/blob/master/Computer%20Vision/01_Custom.ipynb

So you basically only need to remove https://github.com/ and add instead https://colab.research.google.com/github/

muellerzr · February 2, 2020, 4:02pm

I had no idea you could do that! Awesome! (I’ll update the links in the first post unless someone beats me )

Edit: all notebook links in the first post point to Colab Thanks @mgloria!

mgloria · February 2, 2020, 4:24pm

It is my understanding that the list of indexes is precisely what it is generated (i.e. which images belong to training / validation set), see docu. I believe it is solved, check this out @muellerzr

split_list = [L(range(len(train_imgs))), L(range(len(train_imgs), len(train_imgs)+len(tst_imgs)))]

dsrc = Datasets(train_imgs+tst_imgs, tfms=[[PILImage.create], [parent_label, Categorize]],
            splits = split_list)

train_labels = L()
for i in range(len(dsrc.train)):
train_labels.append(dsrc.train[i][1])

skf = StratifiedKFold(n_splits=5)
print(skf.get_n_splits(np.array(train_imgs), train_labels))
for train_index, valid_index in skf.split(np.array(train_imgs), train_labels):
  print("TRAIN:", len(train_index), "VALID:", len(valid_index))
  print("TRAIN:", train_index, "VALID:", valid_index)

These are the outputs (easier to get the idea):

PS I suggest also changing the wording “test” to “valid or dev” set to avoid confusion when doing K-fold in the notebooks

muellerzr · February 2, 2020, 4:28pm

That’s fantastic @mgloria! Well done! I’ll definitely change to stratified whenever I can. And on the valid/test too.

vijayabhaskar · February 2, 2020, 4:40pm

Hi @muellerzr I’m trying to use fastai2 on https://www.kaggle.com/c/understanding_cloud_organization, it has 4 segmentaion masks, I’m guessing I would need a 4 output datablock, but I don’t know how to achieve that. How to create datablocks with multiple number of inputs and outputs in general?

muellerzr · February 2, 2020, 4:45pm

Look at the Object Detection notebook for an example. It has two outputs.