Live coding 7

You can remove that nowadays - it’s default.

rsync is the recommended alternative to scp.

At [23:16] we get a quick glimpse at Jeremy’s local .ssh/config file.
Could a redacted version be posted including commonly used hosts, with a discussion of options and forwarded ports? I read the man page but some additional background info for particular cases would be useful.

This is a summary of what I could see…

global

  • ServerAliveInterval 60
  • ServerAliveCountMax 30
  • StrictHostKeyChecking no

github

  • Port 22 - manpage indicates 22 is default
  • TCPKeepAlive yes - manpage indicates ‘yes’ is default
  • IdentitiesOnly no - manpage indicates ‘no’ is default

personal machines

  • LocalForward ports 8888, 8000, 4000, 3000, 3001

These global ones are actually fairly straight forward, although until I found it too tedious my personal preference for StrictHostKeyChecking would ‘ask’.

I added those many many years ago! I guess they’re not needed now…

1 Like

I think you’ve already got the interesting bits frankly

1 Like

This is for jupyter (8888) and the others are default port numbers for various static site generators, dev web servers, etc I’ve used.

1 Like

At [38:00] Jeremy indicates the Paperspace persistant /storage and /notebook are slow. The reason to care is not the unzipping speed but time for model training to read files from disc. I was curious to experiment. First for comparison, the time to downloaded the dataset…

~:$ time kaggle competitions download -c paddy-disease-classification
real 0m48.779s

~:$ ls -lh
total 1.1G

Then from each of the three filesystems, I ran three operations:

  • single file streaming: time cp ~/paddy-disease-classification.zip .
  • multiple file operations: time unzip paddy-disease-classification.zip
  • directory operations: time rm -r test_images/ train_images/

with the following results (in seconds):

pwd cp unzip rm
/notebooks 1.96 43.282 25.236
/storage 1.188 31.793 49.591
~ 1.021 8.344 0.391

Additionally, unzipping with the source on Storage with output files to Home took 8.849s, not much slower than using the Home filesystem alone.

Thus…
For streaming, Storage is 82% the speed of Home.
For unzipping file operations, Storage is 26% the speed of Home.
Additionally, unzipping from Storage into Home is 94% the speed of Home alone.

So a reasonable strategy might be to store the downloaded zip file on /storage and have the notebook unzip it to the home directory as needed, if its been cleared away by a machine reboot.
For walkthru 7, I ended up with the following in my notebook…

download = Path('/storage/download/paddy-disease-classification.zip')
workpath = Path.home()/'paddy'

if not download.exists():
    !cd /storage/download && kaggle competitions download -c paddy-disease-classification

if not workpath.exists():
    !unzip -o {download} -d {workpath} 
2 Likes

Yes that makes sense. The slowness only occurs when reading or writing lots of files to /storage, so your strategy avoids that nicely.

Hi, so I’m stuck on this PATH issue.

So I installed timm but then I cannot import the library. I eventually noticed that timm was installed into paperspace’s conda directory (/opt). I have adjusted the path accordingly, restarted the instance, but when I run python, the terminal still run paperspace’s python 3.9. How can I begin to troubleshoot this?

Thanks in advance!

Try creating a new instance from scratch. The latest image fixes the python version issue and also includes mamba

Hi. Somewhere in session 7, I lost the ability to access Git.

When I reopen my session, I get the following error message:

I tried reinstalling Git but this did not help. I went back to the previous sessions but could not see anything I missed. I could not find anything useful on the internet.

Appreciate any suggestions. Thanks.

I created a blog on live coding 7. I am just setting up Kaggle on paperspace in this blog, but I will be digging into paddy stuff starting next one.

1 Like