Lesson 3 Advanced Discussion ✅


(Ville Holma) #21

In playing with the data block api, I’ve found it to be yes flexible, but also much slower on big datasets, since it apparently always loads the entire dataset to memory before doing anything else. Or maybe I’m missing something, does anybody know if there’s a way to speed up databunch creation on larger datasets?


#22

Recently read an article on this - it is basically factory scale at this point. You get entire floors of people who work on nothing but labeling pixels.

It still is likely not in the order of magnitude of millions of segmented images, but tens or hundredths of thousands.

In general, one can get very nice results with segmentation on much smaller datasets though!


(Yash Mittal) #23

We generally see people getting CUDA: out of memory error. When we restart the kernel it run fine. What could be possible reasons for that? It looks like the memory is got getting properly managed ?


(Kaspar Lund) #24

This promis to be an interesting dataset for NLP & AI in law: https://case.law/


(Jeremy Howard (Admin)) #25

I normally pick the steepest bit of section (1) in your list. But you should try a few a tell us what works best for you! :slight_smile:


(Jeremy Howard (Admin)) #26

Yeah it’s less of an issue now, with optimized PIL and our faster augmentations - although you might still want to resize if your images are huge.

The best way to install PIL is using the comment at the bottom here:


(Jeremy Howard (Admin)) #27

In high dimensions there are basically no local minima - at least for the kinds of functions that neural net losses create.


(Fred Guth) #28

In the lesson 3, there was an explanation on U-net and how to use it now in v1.

But I do remember that in course-v2 and fastai 0.7, @kcturgutlu implemented a way tp create Dynamic Unets, U-netish models using any pretrained model as encoder: resnet, resnext, vgg…

Are these Dynamic Unets deprecated in v1?


(Jeremy Howard (Admin)) #29

Quite the opposite - that’s what we’re using all the time now! That’s why we were able to automatically create a unet with a given backbone architecture.


(Kerem Turgutlu) #30

You may check https://github.com/fastai/course-v3/blob/master/nbs/dl1/lesson3-camvid.ipynb for how to create_unet in v1. It’s much faster and much lighter in terms of GPU memory,


(Md Mofijul Islam) #31

do jeremy use ULMFit language model in IMDB review task ?


Lesson 3 In-Class Discussion ✅
(Ariel Gamiño) #32

Yes, ULMgfit is the technique. Universal Language Model fit


(Md Mofijul Islam) #33

Thanks, but where jeremy specify the ULMFit model ?


(Francisco Ingham) #34

language_model_learner() implements an AWD-LSTM RNN behind the scenes which is what Jeremy and Sebastian used in ULMFit.


(Paula Alves) #35

@Kaspar what do you mean by “make your own open_image”?


Lesson 3 In-Class Discussion ✅
(Sam) #36

Hello all,

Since I did not get any response from the lesson-3-discussion, I thought I will ask you gurus

I am trying to get a handle on the data_block API. Don’t know what am i doing wrong.
working with a established dataset kaggle whale-categorization-playground
train and test folders contain jpg images (no sub-folders by class etc.)
train.csv contains ImageId/ClassName belonging to the train folder as:

ImageId LabelName
00022e1a.jpg w_e15442c
000466c4.jpg w_1287fbc
00087b01.jpg w_da2efe0
001296d5.jpg w_19e5482

I reached

data = (ImageFileList.from_folder(path) # works
.label_from_csv(path/‘train.csv’, folder=‘train’) # works
.random_split_by_pct(0.2) # works
.datasets() #errors key error on first Id in the train.csv
.transform(tfms, size=128)
.databunch()
.normalize(imagenet_stats))

Can any of you gurus help?

Here is the full trace:

KeyError Traceback (most recent call last)
in ()
----> 1 d=c.datasets()

~/Documents/fastai/courses/v3/nbs/dl1/fastai/data_block.py in datasets(self, dataset_cls, **kwargs)
234 train = dataset_cls(*self.train.items.T, **kwargs)
235 dss = [train]
–> 236 dss += [train.new(*o.items.T, **kwargs) for o in self.lists[1:]]
237 cls = getattr(train, ‘ splits_class ’, self._pipe)
238 return cls(self.path, *dss)

~/Documents/fastai/courses/v3/nbs/dl1/fastai/data_block.py in (.0)
234 train = dataset_cls(*self.train.items.T, **kwargs)
235 dss = [train]
–> 236 dss += [train.new(*o.items.T, **kwargs) for o in self.lists[1:]]
237 cls = getattr(train, ‘ splits_class ’, self._pipe)
238 return cls(self.path, *dss)

~/Documents/fastai/courses/v3/nbs/dl1/fastai/vision/data.py in new(self, classes, *args, **kwargs)
80 def new(self, *args, classes:Optional[Collection[Any]]=None, **kwargs):
81 if classes is None: classes = self.classes
—> 82 return self. class (*args, classes=classes, **kwargs)
83
84 class ImageClassificationDataset(ImageClassificationBase):

~/Documents/fastai/courses/v3/nbs/dl1/fastai/vision/data.py in init (self, x, y, classes, **kwargs)
75 class ImageClassificationBase(ImageDatasetBase):
76 def init (self, x:Collection, y:Collection, classes:Collection=None, **kwargs):
—> 77 super(). init (x=x, y=y, classes=classes, **kwargs)
78 self.learner_type = ClassificationLearner
79

~/Documents/fastai/courses/v3/nbs/dl1/fastai/vision/data.py in init (self, **kwargs)
67 class ImageDatasetBase(DatasetBase):
68 def init (self, **kwargs):
—> 69 super(). init (**kwargs)
70 self.image_opener = open_image
71 self.learner_type = ImageLearner

~/Documents/fastai/courses/v3/nbs/dl1/fastai/basic_data.py in init (self, x, y, classes, c, task_type, class2idx, as_array, do_encode_y)
23 else: self.c = len(self.classes)
24 if class2idx is None: self.class2idx = {v:k for k,v in enumerate(self.classes)}
—> 25 if y is not None and do_encode_y: self.encode_y()
26 if self.task_type==TaskType.Regression: self.loss_func = MSELossFlat()
27 elif self.task_type==TaskType.Single: self.loss_func = F.cross_entropy

~/Documents/fastai/courses/v3/nbs/dl1/fastai/basic_data.py in encode_y(self)
30 def encode_y(self):
31 if self.task_type==TaskType.Single:
—> 32 self.y = np.array([self.class2idx[o] for o in self.y], dtype=np.int64)
33 elif self.task_type==TaskType.Multi:
34 self.y = [np.array([self.class2idx[o] for o in l], dtype=np.int64) for l in self.y]

~/Documents/fastai/courses/v3/nbs/dl1/fastai/basic_data.py in (.0)
30 def encode_y(self):
31 if self.task_type==TaskType.Single:
—> 32 self.y = np.array([self.class2idx[o] for o in self.y], dtype=np.int64)
33 elif self.task_type==TaskType.Multi:
34 self.y = [np.array([self.class2idx[o] for o in l], dtype=np.int64) for l in self.y]

KeyError: ‘w_e15442c’


#37

i had similar issue with datablock API with different data set. try using the standard API, it worked for me

data = ImageDataBunch.from_csv(path, folder='train', sep=None, csv_labels='train.csv', valid_pct=0.2, ds_tfms=get_transforms(), size=128)

(Sam) #38

@miwojc, Thank you !!!
I thought it was just me.
I wonder if I should post this as a bug or not use V1 since it is “WIP”,
Interestingly
vision.data.ImageClassificationDataset says
warnings.warn("ImageClassificationDataset is deprecated and will soon be removed. Use the data block API.


#39

I didn’t have time to dig into it so I moved on but you are right we should report it as a bug.


(Sam) #40

what release of V1 are you on?
I am at 1.0.22 (the latest)