Train big backbone models in the background

Was training a backbone model similar to Part 2 Lesson 10 IMDb which was going to take over ten hours to train. No way was I able to keep the notebook open that long (needed to change wifi networks) and didn’t want to chance a broken connection running up the GPU bill. Wrote a couple scripts you can run on your GPU instance. Start it running then you can shut down your notebook or close your terminal session and the job will keep running.

Here’s a link to my GitHub repo. This is hack code, nothing fancy, but I’ve used it successfully several times. LM backbone model background trainer