Lesson 3 Advanced Discussion ✅

Yeah it’s less of an issue now, with optimized PIL and our faster augmentations - although you might still want to resize if your images are huge.

The best way to install PIL is using the comment at the bottom here:

4 Likes

In high dimensions there are basically no local minima - at least for the kinds of functions that neural net losses create.

6 Likes

In the lesson 3, there was an explanation on U-net and how to use it now in v1.

But I do remember that in course-v2 and fastai 0.7, @kcturgutlu implemented a way tp create Dynamic Unets, U-netish models using any pretrained model as encoder: resnet, resnext, vgg…

Are these Dynamic Unets deprecated in v1?

Quite the opposite - that’s what we’re using all the time now! That’s why we were able to automatically create a unet with a given backbone architecture.

3 Likes

You may check https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-camvid.ipynb for how to create_unet in v1. It’s much faster and much lighter in terms of GPU memory,

5 Likes

do jeremy use ULMFit language model in IMDB review task ?

Yes, ULMgfit is the technique. Universal Language Model fit

Thanks, but where jeremy specify the ULMFit model ?

language_model_learner() implements an AWD-LSTM RNN behind the scenes which is what Jeremy and Sebastian used in ULMFit.

5 Likes

@Kaspar what do you mean by “make your own open_image”?

Hello all,

Since I did not get any response from the lesson-3-discussion, I thought I will ask you gurus

I am trying to get a handle on the data_block API. Don’t know what am i doing wrong.
working with a established dataset kaggle whale-categorization-playground
train and test folders contain jpg images (no sub-folders by class etc.)
train.csv contains ImageId/ClassName belonging to the train folder as:

ImageId LabelName
00022e1a.jpg w_e15442c
000466c4.jpg w_1287fbc
00087b01.jpg w_da2efe0
001296d5.jpg w_19e5482

I reached

data = (ImageFileList.from_folder(path) # works
.label_from_csv(path/‘train.csv’, folder=‘train’) # works
.random_split_by_pct(0.2) # works
.datasets() #errors key error on first Id in the train.csv
.transform(tfms, size=128)
.databunch()
.normalize(imagenet_stats))

Can any of you gurus help?

Here is the full trace:

KeyError Traceback (most recent call last)
in ()
----> 1 d=c.datasets()

~/Documents/fastai/courses/v3/nbs/dl1/fastai/data_block.py in datasets(self, dataset_cls, **kwargs)
234 train = dataset_cls(*self.train.items.T, **kwargs)
235 dss = [train]
–> 236 dss += [train.new(*o.items.T, **kwargs) for o in self.lists[1:]]
237 cls = getattr(train, ‘ splits_class ’, self._pipe)
238 return cls(self.path, *dss)

~/Documents/fastai/courses/v3/nbs/dl1/fastai/data_block.py in (.0)
234 train = dataset_cls(*self.train.items.T, **kwargs)
235 dss = [train]
–> 236 dss += [train.new(*o.items.T, **kwargs) for o in self.lists[1:]]
237 cls = getattr(train, ‘ splits_class ’, self._pipe)
238 return cls(self.path, *dss)

~/Documents/fastai/courses/v3/nbs/dl1/fastai/vision/data.py in new(self, classes, *args, **kwargs)
80 def new(self, *args, classes:Optional[Collection[Any]]=None, **kwargs):
81 if classes is None: classes = self.classes
—> 82 return self. class (*args, classes=classes, **kwargs)
83
84 class ImageClassificationDataset(ImageClassificationBase):

~/Documents/fastai/courses/v3/nbs/dl1/fastai/vision/data.py in init (self, x, y, classes, **kwargs)
75 class ImageClassificationBase(ImageDatasetBase):
76 def init (self, x:Collection, y:Collection, classes:Collection=None, **kwargs):
—> 77 super(). init (x=x, y=y, classes=classes, **kwargs)
78 self.learner_type = ClassificationLearner
79

~/Documents/fastai/courses/v3/nbs/dl1/fastai/vision/data.py in init (self, **kwargs)
67 class ImageDatasetBase(DatasetBase):
68 def init (self, **kwargs):
—> 69 super(). init (**kwargs)
70 self.image_opener = open_image
71 self.learner_type = ImageLearner

~/Documents/fastai/courses/v3/nbs/dl1/fastai/basic_data.py in init (self, x, y, classes, c, task_type, class2idx, as_array, do_encode_y)
23 else: self.c = len(self.classes)
24 if class2idx is None: self.class2idx = {v:k for k,v in enumerate(self.classes)}
—> 25 if y is not None and do_encode_y: self.encode_y()
26 if self.task_type==TaskType.Regression: self.loss_func = MSELossFlat()
27 elif self.task_type==TaskType.Single: self.loss_func = F.cross_entropy

~/Documents/fastai/courses/v3/nbs/dl1/fastai/basic_data.py in encode_y(self)
30 def encode_y(self):
31 if self.task_type==TaskType.Single:
—> 32 self.y = np.array([self.class2idx[o] for o in self.y], dtype=np.int64)
33 elif self.task_type==TaskType.Multi:
34 self.y = [np.array([self.class2idx[o] for o in l], dtype=np.int64) for l in self.y]

~/Documents/fastai/courses/v3/nbs/dl1/fastai/basic_data.py in (.0)
30 def encode_y(self):
31 if self.task_type==TaskType.Single:
—> 32 self.y = np.array([self.class2idx[o] for o in self.y], dtype=np.int64)
33 elif self.task_type==TaskType.Multi:
34 self.y = [np.array([self.class2idx[o] for o in l], dtype=np.int64) for l in self.y]

KeyError: ‘w_e15442c’

i had similar issue with datablock API with different data set. try using the standard API, it worked for me

data = ImageDataBunch.from_csv(path, folder='train', sep=None, csv_labels='train.csv', valid_pct=0.2, ds_tfms=get_transforms(), size=128)

@miwojc, Thank you !!!
I thought it was just me.
I wonder if I should post this as a bug or not use V1 since it is “WIP”,
Interestingly
vision.data.ImageClassificationDataset says
warnings.warn("ImageClassificationDataset is deprecated and will soon be removed. Use the data block API.

I didn’t have time to dig into it so I moved on but you are right we should report it as a bug.

what release of V1 are you on?
I am at 1.0.22 (the latest)

@jeremy is there a reason for LM and text classifier learner we used moms=(0.8,0.7), and how did you arrive at that value through trial and error , or is there a general rule of thumb?

Also, reading through the fastai docs, the learners seems to use Adam with fixed weight decay, is this the preferred method now over sgd with nesterov momentum? What’s the process you use to determine the best weight decay values (as it seems like you only use custom values for the last model fitting stage)?

i am frequently updating fastai, the issue i had was a week ago, not sure what version was it at that time…

Sounds like there are classes in valid that aren’t in train. Try creating the full list of classes first and pass them explicitly.

They experimentally seem to work well for RNNs - we’re planning to make them the default.

Yup.

https://www.fast.ai/2018/07/02/adam-weight-decay/

@jeremy,
All the data is as delivered by kaggle
The train.csv is as delivered by kaggle (whale-categorization-playground).
It has image fn (from train) + class name
The train folder (again as delivered) contains all the images that are listed in train.
There is a test folder (which has images) which are not listed in the csv file for obvious reasons (train.csv provides labels and test should not have any labels). my expectation was that random_split_by_pct(0.2) would split the images in train into train and valid grouping.
(I can imagine that this kind of data representation may be common to many kaggle competitions)

Interestingly the keyerror is pointing to a class name (KeyError: ‘w_e15442c’) and the train directory contains images with names (00022e1a.jpg) which definitely exists.