Lesson 2 official topic

orangelmx · May 4, 2022, 1:13am

Thanks, Jemery,

I was looking on the DDP with fastai, for anyone interested

jeremy · May 4, 2022, 1:19am

Good find! Only works with a very old version of fastai, unfortunately.

But would be a cool (advanced) student project to update it for fastai2, if anyone’s interested.

orangelmx · May 4, 2022, 1:48am

@jeremy can you tell us what and where to look and work? I don’t know if this is the right question.

Thanks.

jeremy · May 4, 2022, 1:53am

Not really - it’s a large and advanced project, and if I knew what to do exactly, I’d have done it myself!

VishnuSubramanian · May 4, 2022, 2:14am

Add this IntToFloatTensor(div_mask=255) to your batch_tfms.

You may find this blog useful.

marix1120 · May 4, 2022, 6:37am

Hi Newbie here.

I never really used github desktop (only used to store my notebooks) nor console but I am assuming the github desktop is easier to use.
Can someone please help me create app.py file (I only used notebooks before) using github console? I followed the video but didn’t work.

Thanks.

balnazzar · May 4, 2022, 6:59am

Just a few suggestions:

You don’t want to parallelize your models. That’s a very advanced topic, as Jeremy himself highlighted.
One way to use multiple GPUs is by running different experiments on each of them, as Jeremy suggested again. It’s quite time-saving.
It you want to use both (or all) your GPUs for the same experiment, you can quite easily do that by data- (not model-!) parallelization.
That comes quite handy when you want to use a batch size that doesn’t fit into the vram of a single gpu.
It amounts to using Pytorch’s DataParallel. Check old threads about this.

n-e-w · May 4, 2022, 7:45am

@orangelmx Can verify this experience – multi GPU training is full of unexpected, non-linear hassles… like race conditions. Very difficult to trace and debug. It’s just too easy to come unstuck in some nonobvious way and run out of talent especially quickly. This is exacerbated if you have two different GPU models; you will experience bottlenecking from the lower-powered GPU and all sorts of other issues

dhoa · May 4, 2022, 8:42am

Sorry for off-topic but this is a super famous dog in Vietnam, I think he has his own emoji collection :)) He looks so funny

suvash · May 4, 2022, 8:58am

Oh, is it ‘Nguyen van dui’ you’re talking about ? I think I’ve seen it on social media every now and then. Now that you mentioned, on some of the vids/pics, the background tiles do match.
Thanks for resolving this mystery. Not including links here so as not to go too off-topic

jeremy · May 4, 2022, 9:03am

Wow that’s really cool! Thanks for telling us about this very special dog

RogerS49 · May 4, 2022, 9:03am

I ran this code and it worked fine

Chapter 1 Kaggle

jpc · May 4, 2022, 10:01am

To expand on this answer a little bit: Segmentation in fastai is quite picky about the mask format. The background pixels need to have the value of 0 and all the other classes should use subsequent integer values 1, 2, and so forth. This makes inspecting the mask images in an image viewer quite difficult since all your pixels are almost black.

The clever trick @VishnuSubramanian used helps a lot for binary masks (where pixels are either background – black (0) or foreground – white (255)) where you can get away with dividing the mask by 255 to get only 0 or 1 as outputs.

Btw. it would be great if the SegmentationDataloader could handle random colors (with a vocab-like mechanism) or maybe sanity-check the inputs a bit to show better error messages.

mike.moloch · May 4, 2022, 11:12am

While using the pet classifier, I noticed that if I used the whole picture for inference, it classified it as a “cat” but if I used the edit functionality (pencil in the top right corner of the picture) and cropped it just to the head, it correctly identified it as a “dog”.

I thought it was interesting that the classifier took into account various body configurations and the relationships that may/may not exist. OTOH, this dogs ears look very cat-like so it’s impressive that the classifier still picks it as a dog when shown just the head (cropped from the same picture)

Mattr · May 4, 2022, 11:26am

I’ve been looking at the docs and experimenting tonight but not coming up with a solution yet @sambit. Using Google Colab I can’t generally see the downloaded file in the file explorer. Only if I open the terminal can I locate it in the hidden folder .fastai.

My interim hack is to move files using the following terminal cell command. eg:
path = untar_data(URLs.IMDB_SAMPLE)
!mv {path}/'texts.csv' /content/drive/MyDrive/Notebooks/

The biggest issue I am finding with using fastai and Google Colab is that artifacts like trained models are so easily lost. The config.ini folder locations for data, models, archive would work great for a static environment but as far as I understand right now this isn’t possible with Colab. I hope to be proven wrong!

My understanding is that a new instance is a clean slate and you have to start over unless you download or move your artifacts elsewhere and this can be triggered as soon as you close your laptop lid. I’ve used PyTorch Lightning to train models that have been interrupted and restarted using the most recent epoch backup log file, saving time and energy. I am wondering if fastai can or could do this?

jpc · May 4, 2022, 12:53pm

I’ve seen people using Google Drive to persist models and data inside Collab. Maybe the library could be extended to do it automatically for you?

mike.moloch · May 4, 2022, 12:54pm

I tend to agree with your observations. I have found colab’s drive situation to be, at the very least, confusing. This is not so much a function of fastai but just the way the colab ecosystem is setup. I have not tried to continue with data saved across session boundaries (mostly because of this).

I think this can be mitigated with extra coding and checkpointing and whatnot. I would think that saving it in my google drive would at least keep the downloaded stuff (config.ini mappings notwithstanding) but I have not made a whole lot of effort in this regard because I find dealing with google drive simply too clunky and unweildy.

bencoman · May 4, 2022, 2:17pm

Perhaps…

def relocateFile(fromFile, toDir):
  maxFile = max(toDir.glob('*'))
  maxFilePlusOne = str(int(maxFile.stem,10)+1).zfill(len(maxFile.stem))
  toFile = toDir/(maxFilePlusOne + fromFile.suffix)
  fromFile.rename(toFile)
  print('relocated', fromFile, '==>', toFile)

for idx,cat in cleaner.change(): relocateFile(cleaner.fns[idx], path/cat)

jona · May 4, 2022, 3:24pm

Hi repeat class takers–I’ve got a brain teaser for you!

If I start by training a model just like the bears model here, except I use a different three classes: [‘dog’, ‘snake’, ‘OTHER’]…

And I only provide images of dogs and snakes in the training and validation datasets (no images ever provided of the third category)…

What would the learner predict if I input an image of a house?
a) [0,0,1]
b) [.3, .3, .3]
c) [.5, .5, 0]
d) something else

(note these numbers have been rounded to demonstrate different regimes)

Reply to this comment with your answer AND why you think that would be the case.
I’ll post the results of my test code after I see 5 guesses!

kurianbenoy · May 4, 2022, 4:44pm

I might be probably wrong. I feel since there is no data of thrid category, that third category won’t be recognized by model at all. So probably option (c) is my best guess