Lesson 2 official topic

This new approach is now supported in the latest nbdev and will be the default in the future. We’ll be integrating with Quarto, and that’s what they use.

5 Likes

Awesome lecture @jeremy, I know that this is a little ahead of the class but I already know what I wanted to work on and it partially involves segmentation and I found a great notebook from Zachary Mueller and thought I would give it a shot.

I think loading the pretrained weights fails, so I commented that out too.

Maybe it’s the Normalization function? I tried commenting that out and its the same error.

I got the images and labels to load but im getting a GPU error.

I made it so the notebook is easily re-runable and automatically downloads the dataset it uses:
Warning its medical so the images are a bit gross.

I played around with resnet sizes and image sizes and I dont think its an OOM. The CamVid notebook from Zachery can run lr_finder and fit fine, so I guess its my data / accuracy / optimization functions?

I think theres only 1 class which might be an issue, but I have no idea where you would configure that.

Does anyone know what this issue is with?
I have a T4 and it looks like the memory is fine.

import torch
torch.cuda.is_available()
True!!!

Wow GPU enabled WSL that easy. I’ve been battling with cuda enabled WSL with docker blah blah for ages but that method Jeremy showed us was easy as. For those doing it from scratch I did very close to what he showed us but had to git clone the repo from within my wsl (linux) so it didnt change the UNIX line endings (whatever that means). Then running

source setup-conda.sh

worked. Also I followed these CUDA WSL instructions (<=section 3). Which is exactly same except recommends to get the right cuda-wsl driver from nvidia. So then I cloned the books repo but also had to run sudo apt-get install graphviz in the wsl to get the graphs to display. BUT yeah local GPU accelerated fast.ai here we come! Thanks!

7 Likes

Please don’t get it the wrong way, and I’ve done this myself in the past, but just a quick reminder that Jeremy has indicated not to at-mention admins unless it’s an admin issue. If you just state the issue clearly, people would be more than happy to help.

Also, it would be really helpful if the screenshot gave some context of what it is that was being executed that gave that error. All I can tell from this error is that you got some kind of a generic error. As is the case in 90% of debugging situations, usually the generic error you get is not the issue.

You’ll find that most people on this forum are quite helpful, but they’d need the OP to give some kind of context, ie; “you need to help them help you”. For example, even if I try to reproduce your problem, here are the things I’d need to know:

  1. What cell are you trying to run that gives that error?
  2. Which environment were you running it in? (Kaggle notebook, Colab, your home machine?)
  3. what is your setup like?

These things would be helpful so that if someone tries to help you, they wouldn’t spend an inordinate amount of time asking follow up clarifying questions about specifics of the problem that can be very easily provided in the original question and so that their valuable time trying to help is also minimized.

HTH

1 Like

BTW, when I import your notebook into Kaggle and run it, I get the following error for learn.lr_find() which is 3rd cell from the bottom of the notebook:

> /opt/conda/lib/python3.7/site-packages/torch/nn/functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
>    2822     if size_average is not None or reduce is not None:
>    2823         reduction = _Reduction.legacy_get_string(size_average, reduce)
> -> 2824     return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
>    2825 
>    2826 
> 
> IndexError: Target 255 is out of bounds.

I did not see any GPU exhaustion at all, I think it dies before things get to the GPU (just a guess)

1 Like

Question: How to specify a destination folder when calling untar_data()?

This is the function signature:

untar_data(url, archive=None, data=None, c_key='data', force_download=False)

The documentation states:

"Download url to fname if dest doesn’t exist, and extract to folder dest"

However, the documentation & the function signature are not matching. Where do we specify dest?

4 Likes

@sambit

On the right of the documentation is a link to the [source]

Click that and you see…

From that it seems you don’t “specify” dest. Its implied from the fixed base and the default c_key=‘data’ that produces the path for the documentation example…

If you look up to Line 115 you can see how fname is determined from url.

This is the first time of looked at this code, so I’m only guessing, but your options seem to be…

  1. Use the data from the default download position.
  2. Copy the untar_data() function into your code as my_untar_data() and play with the parameter values used with FastDownload and get (i.e. maybe base ??).

Unless there is a compelling reason, the Option 1 default is probably your path of least resistance.

3 Likes

From what I can tell it’s a wrapper around fastdownload and in the fastdownload docs it states that you can change ‘base’ to any directory you want. By default {base} points to ~/.fastdownload or something

EDIT: But I see what you’re saying, base can’t be reached from untar_data()

Instead of get, use download to download the URL without extracting it, or extract to extract the URL without downloading it (assuming it’s already been downloaded to the archive directory). All of these methods accept a force parameter which will download/extract the archive even if it’s already present.

You can change any or all of the base, archive, and data paths by passing them to FastDownload:

d = FastDownload(base=’~/.mypath’, archive=‘downloaded’, data=‘extracted’)

1 Like

Jeremy then showed the Image Classifier Cleaner, and Nick said it pays to visually inspect when using these “open” image searches by @JaviNavarro post.

2 Likes

I’ve added a section to the README now.

2 Likes

3 posts were merged into an existing topic: Help: Python, git, bash, etc :white_check_mark:

I’m running into difficulty with ImageClassifierCleaner(learn).

delete() works as expected:

for idx in cleaner.delete(): cleaner.fns[idx].unlink()

change(), however, fails:

for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)
---------------------------------------------------------------------------
Error                                     Traceback (most recent call last)
/tmp/ipykernel_27738/4259621786.py in <module>
----> 1 for idx,cat in cleaner.change(): shutil.move(str(cleaner.fns[idx]), path/cat)

~/anaconda3/lib/python3.9/shutil.py in move(src, dst, copy_function)
    810 
    811         if os.path.exists(real_dst):
--> 812             raise Error("Destination path '%s' already exists" % real_dst)
    813     try:
    814         os.rename(src, real_dst)

Error: Destination path 'four_seasons/summer/00000129.jpg' already exists

The problem appears to occurs because search_images_ddg() indexes each category independently, starting with 00000000.jpg. This results in duplicate fnames across categories. Accordingly, attempting to change an image from one category to another results in a collision and the code fails.

I can come up with a work around, but wonder if anyone has solved this particular problem?

Maybe there is a utility I missed in the fastai code base. I will look more closely.

3 Likes

I haven’t seen a fix for this - suggestions most welcome!

I am a little confused about which chapters we are supposed to read for the week. Just chapter 2?

Hi @mike.moloch sorry, I totally appreciate Jeremy must get a lot of @ mentions. Is there a better place to post this? I am taking the course in Brisbane and I don’t yet have an intuition with the library to know what common errors are caused by.

Interestingly I don’t see that IndexError.
My environment is a self provisioned VM on GCP.
Its 8 core, 32gb of ram, T4 with Fresh Ubuntu 22.04 and nvidia 470 drivers, python 3.10 and a virtualenv with a fresh install of fastbook and fastai in the directory. I pulled the example notebook mentioned and ran it and it works fine with this hardware and python environment. At first it looked like it was allocating to the MAX of the GPU but now that I have lowered the image size the memory looks well under the limit when the error occurs as per my screenshot above.

I included a full fresh run from top to bottom of my notebook in the git link I posted above to help debug the issue.

The original notebook has multiple classes and uses a few different functions to detect and set them against a dict of codes.
For mine there appears to be only 1 class and I think its represented by 255, while the void is 0.

Why would 255 be out of bounds, surely thats within 8 bit range?
Do I need to change the mask values so that they are say 0 for void and 1 for the first label or something?

I tried to run it without the Normalize function in the DataBlock (which probably uses the wrong stats but that didn’t change anything).

Here is the other code which relates to the mask and codes.

Any help would be greatly appreciated. I searched the notebooks and I can see its not part of fastbook so I understand im going a little off piste here. :skier: :grimacing:

1 Like

Three ideas come to mind:

  • Change download_images to include a new parameter start_idx and have download_images return next_idx so a subsequent call to download_images has a new start_idx
Signature:
download_images(
    dest,
    url_file=None,
    urls=None,
    max_pics=1000,
    n_workers=8,
    timeout=4,
    preserve_filename=False,
    start_indx = 0
)

No breaking changes for existing users. Some added complexity for multiple categories, but with the return of next_idx, not too bad.

  • Create a new function to wrap around:
shutil.move(str(cleaner.fns[idx]), path/cat)

This function would check to see if fname exits and if so, created a unique fname. To the extent fnames are being used for labeling this could get complicated.

  • Create a utility function reindex_dir that would analyze an existing directory and recreate fnames as specified by the function. Perhaps a signature similar to this:
Signature: reindex_dir(path, stem=None, suffix=“jpg”, start_idx=0)

someone can help me with how to get my two GPUs working with fastai. I looked on the forums threads and nothing looked on the internet and nothing.

I tried this os.environ[‘CUDA_VISIBLE_DEVICES’]=‘0,1’
and nothing.

thanks.

That’s a big and challenging topic. I’d recommend avoiding it if possible - I use multi-GPU by running a different model on each, generally.

But if you really want to train one model on multiple GPUs, this is what you’ll need: PyTorch Distributed Overview — PyTorch Tutorials 1.11.0+cu102 documentation

4 Likes

Thanks, Jemery,

I was looking on the DDP with fastai, for anyone interested

2 Likes

Good find! Only works with a very old version of fastai, unfortunately.

But would be a cool (advanced) student project to update it for fastai2, if anyone’s interested.

1 Like