Lesson 1 - Non-beginner discussion

This is the topic for any non-beginner discussion around lesson 1. It won’t be actively monitored by Jeremy or I tonight, but we will answer standing questions in here tomorrow (if needed).

14 Likes

The questions about fit_one_cycle() made me wonder: Is it still the “best” way to train a network?

On the Imagenette leaderboard I see ranger + fit_flat_cos() seems to be leading at least when training networks from scratch.

@JoshVarty It depends on the optimizer used. Adam + One Cycle vs Ranger + Flat Cos Annealing has performed better and worse in a few different cases. IMO, Ranger is better for quick training (<50 epochs) whereas Adam is better for longer training + proper hyper parameters (I worked on the ImageWoof leaderboard and worked around with both quite a lot with the rest of the team :slight_smile: )

I’ll also add you can’t change and switch the two, it won’t work well (Ranger + one_cycle or Adam + Flat Cos)

4 Likes

In recent kaggle comp Bengali.AI which is MNIST like problem but for Bengali, people reported that training witl LR decrease on plateau gives better results that one cycle policy. So I guess it’s dataset dependent too. I would say one cycle is a very good starting point, somewhat similar to what Jeremy just said about resnet.

2 Likes

sgugger On windows I get untar_data method gives error:


RecursionError Traceback (most recent call last)
in
----> 1 path = untar_data(URLs.CAMVID_TINY)
2 dls = SegmentationDataLoaders.from_label_func(
3 path, bs=8, fnames = get_image_files(path/“images”),
4 label_func = lambda o: path/‘labels’/f’{o.stem}_P{o.suffix}’,
5 codes = np.loadtxt(path/‘codes.txt’, dtype=str)

d:\codes\fastai_dev\fastai2\fastai2\data\external.py in untar_data(url, fname, dest, c_key, force_download, extract_func)
215 def untar_data(url, fname=None, dest=None, c_key=‘data’, force_download=False, extract_func=file_extract):
216 “Download url to fname if dest doesn’t exist, and un-tgz to folder dest.”
–> 217 default_dest = URLs.path(url, c_key=c_key).with_suffix(’’)
218 dest = default_dest if dest is None else Path(dest)/default_dest.name
219 fname = Path(fname or URLs.path(url))

d:\codes\fastai_dev\fastai2\fastai2\data\external.py in path(url, c_key)
136 local_path = URLs.LOCAL_PATH/(‘models’ if c_key==‘models’ else ‘data’)/fname
137 if local_path.exists(): return local_path
–> 138 return Config()[c_key]/fname
139
140 # Cell

d:\codes\fastai_dev\fastai2\fastai2\data\external.py in init(self)
14 self.config_path.mkdir(parents=True, exist_ok=True)
15 if not self.config_file.exists(): self.create_config()
—> 16 self.d = self.load_config()
17
18 def getitem(self,k):

d:\codes\fastai_dev\fastai2\fastai2\data\external.py in load_config(self)
34 elif ‘version’ in config: self.create_config(config)
35 else: self.create_config()
—> 36 return self.load_config()
37
38 def create_config(self, cfg=None):

… last 1 frames repeated, from the frame below …

d:\codes\fastai_dev\fastai2\fastai2\data\external.py in load_config(self)
34 elif ‘version’ in config: self.create_config(config)
35 else: self.create_config()
—> 36 return self.load_config()
37
38 def create_config(self, cfg=None):

RecursionError: maximum recursion depth exceeded while calling a Python object

https://forums.fast.ai/t/source-code-study-group/65755

@init_27 and I want to dig deep into the fastai layers and try and beat the defaults in fastai lesson 1. We will be working on the pets dataset itself.

Please do join us as it would be great to work together on this :slight_smile:

3 Likes

This is a bug that was fixed. You shouldn’t have it with a dev installed and the repo up to date. We will make a new release later today in any case.

1 Like

Thank you, I fetched it before the lecture and now update works like a charm on windows. I suggest to add small note for windows user to add “num_workers=0”. It’ll be nice for new comers to be able to run the very first example with out any problems.

i would be interested to hear your experience on Windows when running longer experiments. I used to had issues that it was taking a long time, kind of waiting in between epochs, not doing nothing… thanks!

Well, I always used windows since the very beginning fastai. Currently, I had no serious problems with windows. People say it’s slower than linux and it may be caused by pytorch with number of workers. Other thing might be laptops with integrated cards not used properly with the gpu… So, reducing batch size sometimes help. but over all I’m quite ok … The real problem comes in transmission period to new versions of pytorch and fastai but developed and solved relatively fast.

1 Like

Just a single anecdotal data point, so take it for what it’s worth: when training regression models (Mish + xresnet) for 40-100 epochs on low-complexity astronomical images, I achieve best results with Ranger + fit_one_cycle() rather than the other pairwise combinations. I know there are lots of confounding variables here but maybe it means that it’s still worthwhile to check how other optimizer/schedulers behave.

1 Like

I love hearing me be wrong (seriously I do!) :slight_smile: That’s a very cool finding!! (I’d give a like but my limit for the day was reached :confused: )

2 Likes
pets = DataBlock(blocks=(ImageBlock, CategoryBlock), 
             get_items=get_image_files, 
             splitter=RandomSplitter(),
             get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'),
             item_tfms=Resize(460),
             batch_tfms=aug_transforms(size=224))

Something thats confusing me in the above snippet (from the fastaiv2 docs) is why we’re using Resize to make the images 460x460, and then using aug_transforms to make them 224x224. What would be the difference if we use just aug_transforms(size=224)) and left out Resize(460) ?

In the past we would’ve done something like .transform(tfms, size=64) or just pass size=bla once to the datablock api. Wondering whats different about this.

Excited to get started!

1 Like

To workaround these challenges, presizing adopts two strategies that are shown in <

First, resizing images to relatively “large dimensions” that is, dimensions significantly larger than the target training dimensions.
Second, composing all of the common augmentation operations (including a resize to the final target size) into one, and performing the combined operation on the GPU only once at the end of processing, rather than performing them individually and interpolating multiple times.


it is under the Presizing section.
For a batch transform you want all the images to be of the same size to be able to collate.
So with item transform you make each image a particular size

1 Like

Also, the resizing in the aug_transforms is only applied at a batch level. So in order to get to that point everything must be able to get into a batch (IE everything is the same size)

5 Likes

I am not sure. But from whaterver i have understood,

item_tfms --> Applied when each image is read from the disk. Uses CPU and should be used when you have images of varying sizes. Resizing here is done to help collate the images into batches.

batch_tfms --> I looked into the source of aug_transforms --> it internally calls RandomResizedCropGPU hence it supposedly will do random crop if the size passed is less than actual image size or in item_tfms else it will do bilinear interpolation. I guess we need to use this if we know in advance that all our images in dataset is of fixed size as it uses GPU for processing. (I suppose it would be fast)

2 Likes

Thanks for the quick replies!

So essentially Resize is to make every image in the dataset the same size, and then now since they’re the same shape, they can be batch transformed together easily. Additionally, it ensures that lesser information is lost (if any) during the final resizing. Did i understand that correctly?

3 Likes

I was getting the same error. Had to delete ~/.fastai directory with the old version of data and models
rm -rf ~/.fastai

After this everything worked great.

Do you have a reason why switching from Ranger+one_cycle to Adom+flat_cos won’t work? What do you think is happening?

Nope (I’m the wrong one to ask), just imperially this is what I found through testing on ImageWoof. IIRC the FlatCos came from the gradients exploding due to one cycle, so we kept it high and then brought it low (which was found through experimentation)