Automating shutdown of vm, help needed to bind to errors

I’ve written small function that forces vm shutdown, so if everything goes well at the end of training vm shutdowns itself. That works quite well, but when there is some kind of error (like CUDA out of memory) this won’t work because cell with my function will not be launched.

So how can I bind callback to python error handler or pytorch error handler?

Code:

def shutdown():
os.system(‘sudo shutdown -h now’)

you can create a shell script. Here’s an example:

#!/bin/bash

echo start script
echo shutdown

bash my_script.sh
If the first command fails, the second one will still be executed.

1 Like

That’s simple and brilliant idea, but won’t it only work for python scripts and not for jupyter notebooks?

Not sure what the behavior will be with notebooks. If an error crashes and stops the process it should work too.

An elegant way could be to use a callback. From what I see the after_fit gets called whether an exception occurs or not.

Here is a minimal example:

from fastai2.data.all import *
from fastai2.callback.all import *
from fastai2.learner import Learner

data = torch.rand((100,2)) 
db = DataBlock(get_x = lambda x:x[:1], get_y=lambda x:x[1:])
dls = db.dataloaders(data)

class MyModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layer = nn.Linear(1,1)
    
    def forward(self, x):
        #raise Exception #uncomment to see after_fit still called
        return self.layer(x)

class Shutdown(Callback):
    def after_fit(self):
        print('after fit')
        #os.system('sudo shutdown -h now')
    
learn = Learner(dls, MyModel(), loss_func = F.mse_loss, cbs = Shutdown)
learn.fit_one_cycle(3)
4 Likes

Thank You, this is exactly what I was looking for :slight_smile:

Nice!