Lesson 2 official topic

From what I can tell it’s a wrapper around fastdownload and in the fastdownload docs it states that you can change ‘base’ to any directory you want. By default {base} points to ~/.fastdownload or something

EDIT: But I see what you’re saying, base can’t be reached from untar_data()

Instead of get, use download to download the URL without extracting it, or extract to extract the URL without downloading it (assuming it’s already been downloaded to the archive directory). All of these methods accept a force parameter which will download/extract the archive even if it’s already present.

You can change any or all of the base, archive, and data paths by passing them to FastDownload:

d = FastDownload(base=’~/.mypath’, archive=‘downloaded’, data=‘extracted’)

1 Like

Jeremy then showed the Image Classifier Cleaner, and Nick said it pays to visually inspect when using these “open” image searches by @JaviNavarro post.

2 Likes

I’ve added a section to the README now.

2 Likes

3 posts were merged into an existing topic: Help: Python, git, bash, etc :white_check_mark:

I’m running into difficulty with ImageClassifierCleaner(learn).

delete() works as expected:

for idx in cleaner.delete(): cleaner.fns[idx].unlink()

change(), however, fails:

for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)
---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
/tmp/ipykernel_27738/4259621786.py in <module>
----> 1 for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)

~/anaconda3/lib/python3.9/shutil.py in move(src, dst, copy_function)
    810 
    811         if os.path.exists(real_dst):
--> 812             raise Error("Destination path '%s' already exists" % real_dst)
    813     try:
    814         os.rename(src, real_dst)

Error: Destination path 'four_seasons/summer/00000129.jpg' already exists

The problem appears to occurs because search_images_ddg() indexes each category independently, starting with 00000000.jpg. This results in duplicate fnames across categories. Accordingly, attempting to change an image from one category to another results in a collision and the code fails.

I can come up with a work around, but wonder if anyone has solved this particular problem?

Maybe there is a utility I missed in the fastai code base. I will look more closely.

3 Likes

I haven’t seen a fix for this - suggestions most welcome!

I am a little confused about which chapters we are supposed to read for the week. Just chapter 2?

Hi @mike.moloch sorry, I totally appreciate Jeremy must get a lot of @ mentions. Is there a better place to post this? I am taking the course in Brisbane and I don’t yet have an intuition with the library to know what common errors are caused by.

Interestingly I don’t see that IndexError.
My environment is a self provisioned VM on GCP.
Its 8 core, 32gb of ram, T4 with Fresh Ubuntu 22.04 and nvidia 470 drivers, python 3.10 and a virtualenv with a fresh install of fastbook and fastai in the directory. I pulled the example notebook mentioned and ran it and it works fine with this hardware and python environment. At first it looked like it was allocating to the MAX of the GPU but now that I have lowered the image size the memory looks well under the limit when the error occurs as per my screenshot above.

I included a full fresh run from top to bottom of my notebook in the git link I posted above to help debug the issue.

The original notebook has multiple classes and uses a few different functions to detect and set them against a dict of codes.
For mine there appears to be only 1 class and I think its represented by 255, while the void is 0.

Why would 255 be out of bounds, surely thats within 8 bit range?
Do I need to change the mask values so that they are say 0 for void and 1 for the first label or something?

I tried to run it without the Normalize function in the DataBlock (which probably uses the wrong stats but that didn’t change anything).

Here is the other code which relates to the mask and codes.

Any help would be greatly appreciated. I searched the notebooks and I can see its not part of fastbook so I understand im going a little off piste here. :skier: :grimacing:

1 Like

Three ideas come to mind:

  • Change download_images to include a new parameter start_idx and have download_images return next_idx so a subsequent call to download_images has a new start_idx
Signature:
download_images(
    dest,
    url_file=None,
    urls=None,
    max_pics=1000,
    n_workers=8,
    timeout=4,
    preserve_filename=False,
    start_indx = 0
)

No breaking changes for existing users. Some added complexity for multiple categories, but with the return of next_idx, not too bad.

  • Create a new function to wrap around:
shutil.move(str(cleaner.fns[idx]), path/cat)

This function would check to see if fname exits and if so, created a unique fname. To the extent fnames are being used for labeling this could get complicated.

  • Create a utility function reindex_dir that would analyze an existing directory and recreate fnames as specified by the function. Perhaps a signature similar to this:
Signature: reindex_dir(path, stem=None, suffix=“jpg”, start_idx=0)

someone can help me with how to get my two GPUs working with fastai. I looked on the forums threads and nothing looked on the internet and nothing.

I tried this os.environ[‘CUDA_VISIBLE_DEVICES’]=‘0,1’
and nothing.

thanks.

That’s a big and challenging topic. I’d recommend avoiding it if possible - I use multi-GPU by running a different model on each, generally.

But if you really want to train one model on multiple GPUs, this is what you’ll need: PyTorch Distributed Overview — PyTorch Tutorials 1.11.0+cu102 documentation

4 Likes

Thanks, Jemery,

I was looking on the DDP with fastai, for anyone interested

2 Likes

Good find! Only works with a very old version of fastai, unfortunately.

But would be a cool (advanced) student project to update it for fastai2, if anyone’s interested.

1 Like

@jeremy can you tell us what and where to look and work? I don’t know if this is the right question.

Thanks.

1 Like

Not really - it’s a large and advanced project, and if I knew what to do exactly, I’d have done it myself! :wink:

1 Like

Add this IntToFloatTensor(div_mask=255) to your batch_tfms.

You may find this blog useful.

Hi Newbie here.

I never really used github desktop (only used to store my notebooks) nor console but I am assuming the github desktop is easier to use.
Can someone please help me create app.py file (I only used notebooks before) using github console? I followed the video but didn’t work.

Thanks.

Just a few suggestions:

  • You don’t want to parallelize your models. That’s a very advanced topic, as Jeremy himself highlighted.
  • One way to use multiple GPUs is by running different experiments on each of them, as Jeremy suggested again. It’s quite time-saving.
  • It you want to use both (or all) your GPUs for the same experiment, you can quite easily do that by data- (not model-!) parallelization.
  • That comes quite handy when you want to use a batch size that doesn’t fit into the vram of a single gpu.
  • It amounts to using Pytorch’s DataParallel. Check old threads about this.
3 Likes

@orangelmx Can verify this experience – multi GPU training is full of unexpected, non-linear hassles… like race conditions. Very difficult to trace and debug. It’s just too easy to come unstuck in some nonobvious way and run out of talent especially quickly. This is exacerbated if you have two different GPU models; you will experience bottlenecking from the lower-powered GPU and all sorts of other issues

4 Likes

Sorry for off-topic but this is a super famous dog in Vietnam, I think he has his own emoji collection :)) He looks so funny

11 Likes