Pytorch training loop vs Fastai Learner

vishak · December 23, 2020, 3:09pm

So I’ve been trying both a vanilla pytorch training loop BUT with one cycle learning and the LR finder.
But the Fastai Learner does much better on the same dataset, which uses the same two methods. What other major thing is happening in the fastai learner that it does things so well?

utkb · December 23, 2020, 6:02pm

I think you will need to go through the fastai source code, compare with your own training loop code, and see what differences you spot. For details and citations, a lot of the underlying innovations and tricks are described in this paper.

Yijin

ilovescience · December 23, 2020, 10:08pm

What kind of problem are you working on?

There are various defaults that fastai has that leads to improved performance. Here are some things to check (somewhat specific to pretrained image models):

The optimizer used (fastai uses AdamW)
The hyperparameters used (fastai has different defaults for betas, weight decay, etc.)
Custom model modification by fastai (ex: fastai adds a custom head when fine-tuning pretrained image models)
One-cycle actually starts decreasing 25% into training (pct_start default). Other defaults to be aware of include the starting and ending LR for one-cycle.
One-cycle also schedules the momentum as well

There are probably a lot that I am missing but these are the main ones I can think of (and see in the code).

vishak · December 24, 2020, 3:39am

This is essentially what I’m trying to do. Pass an image through an encoder, then take the encoding and pass it through a decoder for reconstruction.
I don’t use anything pretrained.
I need to use the image encodings, and use them to find similar images. The notebook here does it in one sweep, I’m trying to do progressive resizing.

github.com

oke-aditya/image_similarity/blob/master/image_similarity/train-image-similarity.ipynb

{"cells":[{"metadata":{"id":"eufgAL3xy6Zm"},"cell_type":"markdown","source":["# Train Image Similarity"]},{"metadata":{"id":"QqCQpjnQz9CI"},"cell_type":"markdown","source":["## Mount drive etc"]},{"metadata":{"id":"StLnj_Q_y3vy","trusted":true},"cell_type":"code","source":["!nvidia-smi"],"execution_count":null,"outputs":[]},{"metadata":{"id":"_8hofRqTz-5J"},"cell_type":"markdown","source":["## Run the Training Script"]},{"metadata":{"id":"bKUAY2Z790q_","trusted":true},"cell_type":"code","source":["import torch\n","from PIL import Image\n","import os\n","from tqdm import tqdm\n","from torch.utils.data import Dataset\n","\n","import torchvision.transforms as T\n","from tqdm import tqdm\n","import torch.nn as nn\n","import torch.optim as optim\n"],"execution_count":1,"outputs":[]},{"metadata":{"id":"Vt3SI_V69yuG","trusted":true},"cell_type":"code","source":["class FolderDataset(Dataset):\n","    def __init__(self, main_dir, transform=None):\n","        self.main_dir = main_dir\n","        self.transform = transform\n","        self.all_imgs = os.listdir(main_dir)\n","\n","    def __len__(self):\n","        return len(self.all_imgs)\n","\n","    def __getitem__(self, idx):\n","        img_loc = os.path.join(self.main_dir, self.all_imgs[idx])\n","        image = Image.open(img_loc).convert(\"RGB\")\n","\n","        if self.transform is not None:\n","            tensor_image = self.transform(image)\n","\n","        return tensor_image, tensor_image\n"],"execution_count":2,"outputs":[]},{"metadata":{"id":"TfTyqEPL9ttJ","trusted":true},"cell_type":"code","source":["\n","class ConvEncoder(nn.Module):\n","    def __init__(self):\n","        super().__init__()\n","        # self.img_size = img_size\n","        self.conv1 = nn.Conv2d(3, 16, (3, 3), padding=(1, 1))\n","        self.relu1 = nn.ReLU(inplace=True)\n","        self.maxpool1 = nn.MaxPool2d((2, 2))\n","\n","        self.conv2 = nn.Conv2d(16, 32, (3, 3), padding=(1, 1))\n","        self.relu2 = nn.ReLU(inplace=True)\n","        self.maxpool2 = nn.MaxPool2d((2, 2))\n","\n","        self.conv3 = nn.Conv2d(32, 64, (3, 3), padding=(1, 1))\n","        self.relu3 = nn.ReLU(inplace=True)\n","        self.maxpool3 = nn.MaxPool2d((2, 2))\n","\n","    def forward(self, x):\n","        # Downscale the image with conv maxpool etc.\n","        # print(x.shape)\n","        x = self.conv1(x)\n","        x = self.relu1(x)\n","        x = self.maxpool1(x)\n","\n","        # print(x.shape)\n","\n","        x = self.conv2(x)\n","        x = self.relu2(x)\n","        x = self.maxpool2(x)\n","\n","        # print(x.shape)\n","\n","        x = self.conv3(x)\n","        x = self.relu3(x)\n","        x = self.maxpool3(x)\n","\n","        # print(x.shape)\n","        return x\n","\n","\n","class ConvDecoder(nn.Module):\n","    def __init__(self):\n","        super().__init__()\n","        self.deconv1 = nn.ConvTranspose2d(64, 32, (2, 2), stride=(2, 2))\n","        # self.upsamp1 = nn.UpsamplingBilinear2d(2)\n","        self.relu1 = nn.ReLU(inplace=True)\n","\n","        self.deconv2 = nn.ConvTranspose2d(32, 16, (2, 2), stride=(2, 2))\n","#         self.upsamp1 = nn.UpsamplingBilinear2d(2)\n","        self.relu2 = nn.ReLU(inplace=True)\n","\n","        self.deconv3 = nn.ConvTranspose2d(16, 3, (2, 2), stride=(2, 2))\n","#         self.upsamp1 = nn.UpsamplingBilinear2d(2)\n","        self.relu3 = nn.ReLU(inplace=True)\n","\n","    def forward(self, x):\n","        # print(x.shape)\n","        x = self.deconv1(x)\n","        x = self.relu1(x)\n","        # print(x.shape)\n","\n","        x = self.deconv2(x)\n","        x = self.relu2(x)\n","        # print(x.shape)\n","\n","        x = self.deconv3(x)\n","        x = self.relu3(x)\n","        # print(x.shape)\n","        return x\n"],"execution_count":7,"outputs":[]},{"metadata":{"id":"IjLkAZy49Z7v","trusted":true},"cell_type":"code","source":["IMG_PATH = \"../input/animals-data/dataset/\"\n","IMG_HEIGHT = 512  # The images are already resized here\n","IMG_WIDTH = 512  # The images are already resized here\n","\n","SEED = 42\n","TRAIN_RATIO = 0.75\n","VAL_RATIO = 1 - TRAIN_RATIO\n","SHUFFLE_BUFFER_SIZE = 100\n","\n","LEARNING_RATE = 1e-3\n","EPOCHS = 2\n","TRAIN_BATCH_SIZE = 32  # Let's see, I don't have GPU, Google Colab is best hope\n","TEST_BATCH_SIZE = 32  # Let's see, I don't have GPU, Google Colab is best hope\n","FULL_BATCH_SIZE = 32\n","\n","AUTOENCODER_MODEL_PATH = \"baseline_autoencoder.pt\"\n","ENCODER_MODEL_PATH = \"baseline_encoder.pt\"\n","DECODER_MODEL_PATH = \"baseline_decoder.pt\"\n","EMBEDDING_SHAPE = (1, 64, 64, 64)\n","# TEST_RATIO = 0.2\n"],"execution_count":8,"outputs":[]},{"metadata":{"id":"Vn7OetpD4gjZ","trusted":true},"cell_type":"code","source":["\"\"\"\n","I can write this if we need custom training loop etc.\n","I usually use this in PyTorch.\n","\"\"\"\n","\n","__all__ = [\"train_step\", \"val_step\", \"create_embedding\"]\n","\n","import torch\n","import torch.nn as nn\n","\n","# device = torch.device(\"cuda\"

This file has been truncated. show original

vishak · December 24, 2020, 3:42am

Yeah I guess so. I just wanted to see if there are any other big ideas in there other than the lr-finder and one-cycle that is making big differences.