Blanche
(Maria)
June 11, 2020, 11:10am
1
I’ve written small function that forces vm shutdown, so if everything goes well at the end of training vm shutdowns itself. That works quite well, but when there is some kind of error (like CUDA out of memory) this won’t work because cell with my function will not be launched.
So how can I bind callback to python error handler or pytorch error handler?
Code:
def shutdown():
os.system(‘sudo shutdown -h now’)
you can create a shell script. Here’s an example:
#!/bin/bash
echo start script
echo shutdown
bash my_script.sh
If the first command fails, the second one will still be executed.
1 Like
Blanche
(Maria)
June 11, 2020, 12:44pm
3
That’s simple and brilliant idea, but won’t it only work for python scripts and not for jupyter notebooks?
Not sure what the behavior will be with notebooks. If an error crashes and stops the process it should work too.
An elegant way could be to use a callback. From what I see the after_fit
gets called whether an exception occurs or not.
Here is a minimal example:
from fastai2.data.all import *
from fastai2.callback.all import *
from fastai2.learner import Learner
data = torch.rand((100,2))
db = DataBlock(get_x = lambda x:x[:1], get_y=lambda x:x[1:])
dls = db.dataloaders(data)
class MyModel(torch.nn.Module):
def __init__(self):
super().__init__()
self.layer = nn.Linear(1,1)
def forward(self, x):
#raise Exception #uncomment to see after_fit still called
return self.layer(x)
class Shutdown(Callback):
def after_fit(self):
print('after fit')
#os.system('sudo shutdown -h now')
learn = Learner(dls, MyModel(), loss_func = F.mse_loss, cbs = Shutdown)
learn.fit_one_cycle(3)
4 Likes
Blanche
(Maria)
June 17, 2020, 2:44pm
6
Thank You, this is exactly what I was looking for