I’ve written small function that forces vm shutdown, so if everything goes well at the end of training vm shutdowns itself. That works quite well, but when there is some kind of error (like CUDA out of memory) this won’t work because cell with my function will not be launched.
So how can I bind callback to python error handler or pytorch error handler?
os.system(‘sudo shutdown -h now’)
you can create a shell script. Here’s an example:
echo start script
If the first command fails, the second one will still be executed.
That’s simple and brilliant idea, but won’t it only work for python scripts and not for jupyter notebooks?
Not sure what the behavior will be with notebooks. If an error crashes and stops the process it should work too.
An elegant way could be to use a callback. From what I see the
after_fit gets called whether an exception occurs or not.
Here is a minimal example:
from fastai2.data.all import *
from fastai2.callback.all import *
from fastai2.learner import Learner
data = torch.rand((100,2))
db = DataBlock(get_x = lambda x:x[:1], get_y=lambda x:x[1:])
dls = db.dataloaders(data)
self.layer = nn.Linear(1,1)
def forward(self, x):
#raise Exception #uncomment to see after_fit still called
#os.system('sudo shutdown -h now')
learn = Learner(dls, MyModel(), loss_func = F.mse_loss, cbs = Shutdown)
Thank You, this is exactly what I was looking for