Wiki: Lesson 1


(Sairam) #288

Thanks Marc!


(Murali Mohana Krishna) #289

Can you let me know how are you downloading images for this exercise?


(Adam) #290

Hi, I am trying to run the code in lesson 1 and am getting a cuda error. It seems after it runs fit it is not releasing the cuda resource. So when I run the cell:

arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 2)

I get the error:

RuntimeError: cuda runtime error (46) : all CUDA-capable devices are busy or unavailable at /opt/conda/conda-bld/pytorch_1518244421288/work/torch/lib/THC/generic/THCStorage.cu:58

How can I solve this?


(Marc Rostock) #291

Hi Adam,
Never seen this myself, but you don’t seem to be the only one with this type of error.

Maybe that helps, see last comment about switching modes of your GPU:

Other than that are you aware you are running multiple processes that use the GPU?
I am not sure what to make of this SO-Answer but maybe otherwise keep looking in that direction:

In general, just googling for your error message will often help you find the solution. It’s what I did with your message above.


(Gang Cheng) #292

This may be related to some GPU memory is consumed by supporting display, especially 4K+ monitor. You may want to separate display and computer into different cards


(Gang Cheng) #293

The image size is tied to your particular problem and the computing power you have. Large image size needs more GPUs (memory) and will be slower in training. However, if you downsize the images too much, the important features may get lost. Medical images typically need higher resolution than other samples.


(Gang Cheng) #294

This V2 is very different from V1 since it enforces more on learning the high-level picture (top-down approach) by abstracting more implementation into fast.ai library. It is not a course to learn how to use TF, Keras, etc.


(Kevin Chow) #295

Is there more information (maybe a paper/post and author) about the epithelial/stroma classifier mentioned at 47:18? I’m interested in further reading about this.


(Amal) #296

How to change the batch size?
I can’t find this variable in the code!


#297

Batch size is identified in the def get_augs() function as bs.


(Carlos Vouking) #298

You can change your batch size like so:

data = ImageClassifierData.from_paths (path, tfms=tfms, bs=30, …)

Hope this helps.


(Amal) #299

I have a question…
How could the classifier asses itself on the test set while it doesn’t know the correct answers of the test set?
I mean how could the classifier be sure that it will achieve certain accuracy 98 or 99 when it only knows the correct answers of the validation set but not the test set?

How could it be sure that it will achieve exactly the same accuracy as the validation set on the test set?


(Kofi Asiedu Brempong) #300

Hey guys, check out my first medium post, its based on an image classifier i wrote.
please let me know your thoughts


#301

Hi everyone!

I’m having a hard time with using the fast.ai library on a Linux machine (ScientificLinux7 ) I SSH to. In short: When building resnet50, the machine I SSH to is unable to locate the pre-trained model.

I set up the fast.ai on the machine by following the instructions on the wiki. When I try to build the fast.ai model:

PATH = 'my_data/hep_images/' sz = 300 arch = resnet50 data = ImageClassifierData.from_paths(PATH,tfms=tfms_from_model(arch, sz),bs=32 ) learn = ConvLearner.pretrained(arch, data, precompute=True)

This results in the following error:

`
FileNotFoundError Traceback (most recent call last)
in ()
2 arch = resnet50
3 data = ImageClassifierData.from_paths(PATH,tfms=tfms_from_model(arch, sz),bs=32 )
----> 4 learn = ConvLearner.pretrained(arch, data, precompute=True)

/mnt/scratch/eab326/fastai/courses/dl1/fastai/conv_learner.py in pretrained(cls, f, data, ps, xtra_fc, xtra_cut, custom_head, precompute, pretrained, **kwargs)
111 pretrained=True, **kwargs):
112 models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg,
–> 113 ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut, custom_head=custom_head, pretrained=pretrained)
114 return cls(data, models, precompute, **kwargs)
115

/mnt/scratch/eab326/fastai/courses/dl1/fastai/conv_learner.py in init(self, f, c, is_multi, is_reg, ps, xtra_fc, xtra_cut, custom_head, pretrained)
38 else: cut,self.lr_cut = 0,0
39 cut-=xtra_cut
—> 40 layers = cut_model(f(pretrained), cut)
41 self.nf = model_features[f] if f in model_features else (num_features(layers)*2)
42 if not custom_head: layers += [AdaptiveConcatPool2d(), Flatten()]

/mnt/scratch/eab326/anaconda3/envs/fastai/lib/python3.6/site-packages/torchvision/models/resnet.py in resnet50(pretrained, **kwargs)
186 model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
187 if pretrained:
–> 188 model.load_state_dict(model_zoo.load_url(model_urls[‘resnet50’]))
189 return model
190

/mnt/scratch/eab326/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/model_zoo.py in load_url(url, model_dir, map_location)
55 model_dir = os.getenv(‘TORCH_MODEL_ZOO’, os.path.join(torch_home, ‘models’))
56 if not os.path.exists(model_dir):
—> 57 os.makedirs(model_dir)
58 parts = urlparse(url)
59 filename = os.path.basename(parts.path)

/mnt/scratch/eab326/anaconda3/envs/fastai/lib/python3.6/os.py in makedirs(name, mode, exist_ok)
218 return
219 try:
–> 220 mkdir(name, mode)
221 except OSError:
222 # Cannot rely on checking for EEXIST, since the operating system

FileNotFoundError: [Errno 2] No such file or directory: ‘/home/eab326/.torch/models’
`

I’d appreciate any help resolving this issue. Thank you!


(Fred Guth) #302

I guess that your question got lost among others and as I didn’t see anyone answer I will try to explain (I guess you know by now, but at least it can be registered for posterity).

So, an epoch is one pass in all the training set. The batch size is how many images you analyse at once. See, images are represented as tensors (n dimensional arrays) and you can make calculations with big tensors or small tensors.

If you use big tensors it is faster because you need less computations to pass through all your data. In other words, you need less iterations to finish an epoch. But you will also need more memory to keep that big tensor (the batch) in RAM. In general, you want to keep batches as big as your GPU memory allow.

To set the batch size you can try different sizes and keep an eye in your GPU usage (nvidia-smi command). If you set it too high, your code will have a runtime error. If that happens, try to reduce the batch size.


(Shubham Gupta) #303

Hey guys, check out next part of the blog on data visualization techniques. Any suggestions, queries and comments are most welcomed.


(William Horton) #305

ConvLearner.pretrained should work in Kaggle Kernels now if you turn on the new Internet connected option. I’ve got lesson 1 running in this kernel: https://www.kaggle.com/hortonhearsafoo/fast-ai-lesson-1


(Murali Mohana Krishna) #306

Hello everybody,

I am Murali from India. I feel this course is a perfect complementary after Andrew NG’s deep learning specialization.

I started a series ‘Fast.AI Deep Learnings’ where I would like to practically implement and share my experiences about each topic.

Here is the first post

Please provide your feedback :slight_smile:


(Dov Grobgeld) #307

Just for the record, I had two problems with my computer. The first was that the PSU was broken and would crash the computer. The second was that 16GB was apparently not enough. I have now upgraded to 32GB and a got a new PSU (650W) and I can now run all the fastai exercises.


#308

Encounter problem to train a new dataset in lesson 1.
I have download some images of snake and bear through Google. Then I try to train a new dataset by referring to lesson 1’s notebook.

Following are my input in the notebook

PATH = “data/bearsnake/”
subfolder1=‘bear’
subfolder2=‘snake’
sz=224
os.makedirs(‘data/bearsnake/models’, exist_ok=True)
os.makedirs(‘data/bearsnake/train’, exist_ok=True)
os.makedirs(‘data/bearsnake/valid’, exist_ok=True)

import zipfile

path_to_zip_file= ‘data’ + ‘/bearsnake.zip’
directory_to_extract_to=‘data/bearsnake/train’

zip_ref = zipfile.ZipFile(path_to_zip_file, ‘r’)
zip_ref.extractall(directory_to_extract_to)
zip_ref.close()

Everything above are alright. I can see the images to be show.

img = plt.imread(f’{PATH}valid/bear/{files[0]}’)
plt.imshow(img);
The problem comes out when I try to train the model.

arch=resnet34
data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
learn = ConvLearner.pretrained(arch, data, precompute=True)
learn.fit(0.01, 3)
Following is the error message from notebook

OSError Traceback (most recent call last)
in ()
1 arch=resnet34
2 data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz))
----> 3 learn = ConvLearner.pretrained(arch, data, precompute=True)
4 learn.fit(0.01, 3)

~/fastai/courses/dl1/fastai/conv_learner.py in pretrained(cls, f, data, ps, xtra_fc, xtra_cut, custom_head, precompute, pretrained, **kwargs)
112 models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg,
113 ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut, custom_head=custom_head, pretrained=pretrained)
–> 114 return cls(data, models, precompute, **kwargs)
115
116 @classmethod

~/fastai/courses/dl1/fastai/conv_learner.py in init(self, data, models, precompute, **kwargs)
98 if hasattr(data, ‘is_multi’) and not data.is_reg and self.metrics is None:
99 self.metrics = [accuracy_thresh(0.5)] if self.data.is_multi else [accuracy]
–> 100 if precompute: self.save_fc1()
101 self.freeze()
102 self.precompute = precompute

~/fastai/courses/dl1/fastai/conv_learner.py in save_fc1(self)
177 m=self.models.top_model
178 if len(self.activations[0])!=len(self.data.trn_ds):
–> 179 predict_to_bcolz(m, self.data.fix_dl, act)
180 if len(self.activations[1])!=len(self.data.val_ds):
181 predict_to_bcolz(m, self.data.val_dl, val_act)

~/fastai/courses/dl1/fastai/model.py in predict_to_bcolz(m, gen, arr, workers)
15 lock=threading.Lock()
16 m.eval()
—> 17 for x,*_ in tqdm(gen):
18 y = to_np(m(VV(x)).data)
19 with lock:

~/anaconda3/envs/fastai/lib/python3.6/site-packages/tqdm/_tqdm.py in iter(self)
929 “”", fp_write=getattr(self.fp, ‘write’, sys.stderr.write))
930
–> 931 for obj in iterable:
932 yield obj
933 # Update and possibly print the progressbar.

~/fastai/courses/dl1/fastai/dataloader.py in iter(self)
86 # avoid py3.6 issue where queue is infinite and can result in memory exhaustion
87 for c in chunk_iter(iter(self.batch_sampler), self.num_workers*10):
—> 88 for batch in e.map(self.get_batch, c):
89 yield get_tensor(batch, self.pin_memory, self.half)
90

~/anaconda3/envs/fastai/lib/python3.6/concurrent/futures/_base.py in result_iterator()
584 # Careful not to keep a reference to the popped future
585 if timeout is None:
–> 586 yield fs.pop().result()
587 else:
588 yield fs.pop().result(end_time - time.time())

~/anaconda3/envs/fastai/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
430 raise CancelledError()
431 elif self._state == FINISHED:
–> 432 return self.__get_result()
433 else:
434 raise TimeoutError()

~/anaconda3/envs/fastai/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
–> 384 raise self._exception
385 else:
386 return self._result

~/anaconda3/envs/fastai/lib/python3.6/concurrent/futures/thread.py in run(self)
54
55 try:
—> 56 result = self.fn(*self.args, **self.kwargs)
57 except BaseException as exc:
58 self.future.set_exception(exc)

~/fastai/courses/dl1/fastai/dataloader.py in get_batch(self, indices)
73
74 def get_batch(self, indices):
—> 75 res = self.np_collate([self.dataset[i] for i in indices])
76 if self.transpose: res[0] = res[0].T
77 if self.transpose_y: res[1] = res[1].T

~/fastai/courses/dl1/fastai/dataloader.py in (.0)
73
74 def get_batch(self, indices):
—> 75 res = self.np_collate([self.dataset[i] for i in indices])
76 if self.transpose: res[0] = res[0].T
77 if self.transpose_y: res[1] = res[1].T

~/fastai/courses/dl1/fastai/dataset.py in getitem(self, idx)
194 xs,ys = zip(*[self.get1item(i) for i in range(*idx.indices(self.n))])
195 return np.stack(xs),ys
–> 196 return self.get1item(idx)
197
198 def len(self): return self.n

~/fastai/courses/dl1/fastai/dataset.py in get1item(self, idx)
187
188 def get1item(self, idx):
–> 189 x,y = self.get_x(idx),self.get_y(idx)
190 return self.get(self.transform, x, y)
191

~/fastai/courses/dl1/fastai/dataset.py in get_x(self, i)
271 super().init(transform)
272 def get_sz(self): return self.transform.sz
–> 273 def get_x(self, i): return open_image(os.path.join(self.path, self.fnames[i]))
274 def get_n(self): return len(self.fnames)
275

~/fastai/courses/dl1/fastai/dataset.py in open_image(fn)
249 raise OSError(‘No such file or directory: {}’.format(fn))
250 elif os.path.isdir(fn) and not str(fn).startswith(“http”):
–> 251 raise OSError(‘Is a directory: {}’.format(fn))
252 else:
253 #res = np.array(Image.open(fn), dtype=np.float32)/255

OSError: Is a directory: data/bearsnake/train/ snake/.ipynb_checkpoints

My question is what should I do rectify or u show me the steps to train a different dataset.

Please click following pdf file for the screenshot in jupyter notebook
lesson1-(diff dataset).pdf (343.7 KB)