Platform: Salamander ✅

Unfortunately during busy periods AWS can reassign our servers, because we use the cheaper “spot instances” to keep the price down. If you need more reliable servers, you’ll need to pay the full price and get them directly from AWS.

ugh, I’ve been hitting this "random server shutdown’ over and over the past hour…basically wasted nearly an hour b/c I would setup, start training and blam…kernel connection error / connection lost…
so ultimately a waste of an hour of ‘work’ time just setting up, start training and getting kicked off.
I thought it was that specific server so I setup a new server only to hit the same issues.
At least I can see the issue now from the posts here.

It would be nice if you could offer two tiers like paperspace does (one that is not pre-emptible, one that is) b/c if I am doing work for a client I really don’t want to mess around with getting booted off over and over.

Is there a way to know when AWS is busy? Are some types of servers better than others?

1 Like

Good question. If it helps, I’ve been running 2 servers all day today with no issue (dual track training lol).
But certainly knowing when it’s busy or not would be helpful!

Yeah these are great questions. I suspect it depends a lot on the choice of server. Have you tried the g3s instance type?

I have noticed things tend to be very quiet on weekends BTW.

i can confirm availability depends on server type more than any other factor. switching to a different instance type when your server is offline is the most reliable way to get around this issue

1 Like

Im getting this error while importing libraries

ImportError: /home/ubuntu/anaconda3/lib/python3.7/site-packages/torchvision/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN2at7getTypeERKNS_6TensorE

Hi, I don’t see fastai as a kernel and I get error when trying to load the from fastai.vision import * .

Can you help?

Hello, I just bought some credits on Salamander and from fastai.vision import * gives error every single time. Very disappointing experience though. Can you please help?

Generally if you’re having trouble with your instance, the easiest fix is to create a new one - we can’t really debug your Linux or Python/Pytorch problems for you. However often simply updating your fastai and pytorch libs is enough - https://course.fast.ai/update_salamander.html

2 Likes

Although updating fastai and pytorch libs does not work, creating a new instance did solve the problem. Thank you very much Jeremy.

Does anyone know how to download a folder from Salamander to the local PC? I want to download my data folder. Thanks

To download a folder you can tar it then use scp (using the terminal).

Just want say that I also cannot import fastai.vision on Salamander. This is a new problem for me. It has worked in the past.

I also cannot generate keys to SSH in, and have had this problem with another (since deleted) instance. So I am going to lose the notebooks that I’ve created when I destroy this one. The “Generate Keys” just swirls for a while and then gives me this error.

image

Can anyone help me at least generate the keys before I destory the instance?

1 Like

I’ll see if I can figure out what’s going on there. Note however that you click new->terminal in Jupyter, which gives you a full shell - you can use that to put your ssh credentials in too.

1 Like

I have been unable to ssh either from the notebook to my local computer or from my computer to the notebook. I am new to ssh but it seems fairly straightforward. I have no problem generating the keys on either end.

the ssh-copy-id command is failing on both ends. I have tried using the IP address in the terminal (which I have read elsewhere in the forum is actually a local IP address) as well as the IP address in the dashboard. Is there some other address that I have to use?

I also cannot import fastai.vision on Salamander, re: Salamander fastai.vision import error, _C symbol

I tried restarting the server as well and it seems like it imports fine but then gives the error from the link

Hi there,

I recently got approved for AWS Educate and got promotional credit for a personal AWS account. I try to enter the code and I get the message “Invalid Promotion Code”. Did I create the wrong account type (I had a choice between AWS Account or AWS Educate Starter Account) or is this a bug?

Thanks!

Had the same issue (Invalid code) weeks ago, never figured it out…

I have a server that is stuck at ‘Offline - moving storage’. It has been for at least an hour and a half (likely longer, I’m not sure when my server auto-shutdown last night). It’s only 75gb SSD, of which I’ve used… I don’t know, I’d be surprised if I’ve used 30gb, but maybe?

I’d like to get access to that server, it unfortunately has some code and data that I need.

Also, as an aside, I just want to second Mark_F’s problem with generating SSH keys. I was unable to generate SSH keys or upload them. I got timeouts and the same ‘Network error: Unexpected token < in JSON at position 0’

1 Like