Live coding 8

mike.moloch · June 12, 2022, 4:42pm

Yeah that’s what I’ve been doing so far tbh. I could probably add it to my startup script on the host side to do it to the folders that I mount in the guest container before firing it up, but the issue arose when I git cloned a repo within the mounted disk but it wouldn’t show it until I stopped the container, chmod-ed on the host side and relaunched the container. So, if there are any pointers on that I’d appreciate it (not something fully worked out just a general pointer in that direction)

P.S. I found this article which sort of explains what I need to do, but it assumes starting things up from a dockerfile (RUN etc etc.) but I actually just run the vanilla container right off of the paperspave registry… maybe I can just add these addgroup directives to my RUN command on the host side)

adpostma · June 12, 2022, 6:35pm

I have downloaded the latest version of timm. After importing it show a list of models (timm.list_models()…
When I try to create a learner using
learn = vision_learner(dls, ‘convnext_small_in22k’)
I get an error:
NameError: name ‘timm’ is not defined,
in line 169 model = timm.create_model(arch, pretrained=pretrained, num_classes=0, in_chans=n_in)

Any suggestions?

zymoide1 · June 12, 2022, 7:22pm

Is there a way to clear out any CUDA memory? I’m using the same server and want to try out a few other architectures, but am getting an error that my memory is full (even though I’m not training anything right now).

zymoide1 · June 12, 2022, 7:43pm

Ok, you can simply restart the kernel and that’ll do the job. If anyone knows a command we can run, please do share. Thanks.

suvash · June 12, 2022, 7:53pm

Can’t remember on which one, but this was covered pretty well in one of the recent walkthroughs. The most likely reason is that you might have forgotten to import timm. Python is basically telling you that a name called timm is not available.

If you were able to run timm.list_models(), then this error should not really show up. It could be the case that you restarted the kernel, but did not run the cell with the timm import again ?

suvash · June 12, 2022, 7:56pm

Restarting the notebook(and any other notebooks that might still be running & holding on to GPU memory) run is a sure way to clear allocated cuda memory.

If it’s the only notebook you’re running and keep running into this error even after restarts, then you need to decrease the batch size so that it fits in GPU memory.

suvash · June 12, 2022, 8:23pm

I feel like talking more about Docker problems* here feels like hijacking this topic, maybe we should continue the conversation elsewhere, but if you want to see a sample directory/structure I’ve setup then you can follow it here.
This one points to the custom entrypoint I have for the uid:gid switch logic, but the directory also contains a bunch of supporting files around it (eg. Dockerfile, compose, makefile etc.)
https://github.com/suvash/nixos-nvidia-cuda-python-docker-compose/blob/main/05-files/bin/entrypoint.sh

Hopefully, this points you in the direction. Let me know if you have more questions.

adpostma · June 12, 2022, 8:25pm

Thanks Sjuvas, but no luck. timm.list_models() works correctly but error still occurs.

suvash · June 12, 2022, 8:31pm

That sounds odd, can you share an example of the failure, maybe the whole failing notebook(in a Github gist) ?

adpostma · June 13, 2022, 8:00pm

Thanks Suvash, you were right about it sounding odd. Reinstalling timm as mentioned on the start of walktru 9 did the job.

bilalUWE · June 20, 2022, 1:00am

Correct me if I’m wrong: linking the predictions (idxs) to file names can be tricky using get_preds as this method cannot return the actual file names. The decode method can only trace things back to image objects but not file names. If we can somehow obtain file name from get_preds then there will be no need to sort test images before calling this method which is susceptible to linking error like it happened during the first kaggle submission.

We can simply loop through each image and call get_preds to keep track of file names but this might increase the inference time.

Or we could link idxs with file names in the dls provided dls doesn’t shuffle files in the first place.

Or don’t know if there are best practices for linking predictions to actual input files?

Any ideas?

bencoman · June 20, 2022, 10:41am

I repplicated the parallel execution example at [19:17] while I had dmon runnning, but dmon didn’t show even a blip of activity - i.e. sm=0 mem=0. Is parallel running not monitored? Or did I accidently start a CPU-only instance, which begs the followup question…

Is there some way from within an instance to tell what instance-type is running?

mike.moloch · June 20, 2022, 10:44am

As I understand it, parallel does things on the CPU side. On the instance type question, I would probably just check the number of CPUs and check for availability of a GPU (nvidia-smi) and extrapolate from that. I’m not sure if Paperspace provides a command for this. Sometimes /proc filesystem has interesting system related information in it (on Linux os.)

bencoman · June 20, 2022, 1:39pm

[Edit:] Whoops, that was meant to reply to OP by Mattr.

Different from what you asked, but you reminded me that I always liked delete-inner and delete-around. i.e. starting with… aaa “b|bb” ccc

<esc>di" ==> aaa “” ccc
<esc>da" ==> aaa ccc

jeremy · June 20, 2022, 7:01pm

The items attr of a dataset will contain the file names in the order used in the dataloader.

bilalUWE · June 20, 2022, 10:31pm

Thanks Jeremy. It worked. I combined dl.items and idxs and then used mapping dictionary to add the labels column. Perfect, this is what I wanted to achieve.

No need to worry about linking mismatch. Cool.

Daniel · June 21, 2022, 2:00pm

A more detailed note on walkthru 8

00:00 starting with question and answer session

How to get things set up in a local machine?
04:44 - How to set up kaggle on a local machine?

09:47 - Setting up to run on your own GPU server locally and remotely

pathlib

14:17 What is the pathlib? From where do import this library? How do we use Path() or what can we put in as parameters? (I need to experiment on notebook and visualize examples myself)