Share your work here ✅

devforfu · December 9, 2018, 12:00pm

Custom Training Loop Implementation

Last few weeks I had a rough time trying to fight with memory leakage when training the model on a large dataset. As a side effect, I’ve written a simple training loop using pytorch and torchvision.

This custom loop didn’t help me too much because the leakage was still here. (Probably this issue is already solved in the most recent version of PyTorch). However, it was a quite interesting experience of implementing deep learning model’s training process. Much more simple and less interesting compared to what we have in the fastai library, of course

Probably it would be interesting for someone who wants to learn more about pytorch and Python. So I’ve created a few little projects.

1) Medium post briefly describing the code.

2) A notebook that entirely contains the code discussed and a bit more information about the solution.

github.com

devforfu/pytorch_playground/blob/master/loop.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {

This file has been truncated. show original

3) Almost the same as in the notebook but as Python scripts.

Essentially, all these links are about the same thing but from different perspectives. I believe there could be some bugs in these implementations. They were written in several days so definitely not a something you would like to use in a production environment It boils down to the following snippet and a bunch of callbacks.

gist.github.com

https://gist.github.com/devforfu/3bdefe1e09470da01216850f43bf0f85

callbacks_train.py

def train(model, opt, phases, callbacks=None, epochs=1, device=default_device, loss_fn=F.nll_loss):
    model.to(device)
    
    cb = callbacks
    
    cb.training_started(phases=phases, optimizer=opt)
    
    for epoch in range(1, epochs + 1):
        cb.epoch_started(epoch=epoch)

This file has been truncated. show original

Any advice or remark about written text or implementation is much appreciated. I’ve tried to proof-read the article but the notebook contains some typos I guess. I’ll try to make it better within a few days.