Reengineering fastai for rossman example in lesson 6

I’ve tried to recreate the rossman example by taking reference from the fastai library but i don’t get the same results. i.e my validation_loss is 40+ at the end of one cycle. I only get it down to 0.01 if I cheat by randomizing the training sample to include all date ranges.

Below are the steps I gather from going through the fastai source code. Can someone point out if i’m missing something?

I’ve made sure that the neural network is exactly the same, i.e embedding sizes ,batchnorms, relu, dropouts etc.
I’ll be glad to share my code if required. (messy)

  1. Preprocessing
    Continuous variables
    For each continuous variable with missing values, create a categorical variable that contains false for each missing value and true otherwise. Fill missing values with the median value.
    Categorical variables
    Call .astype(‘category’).cat.as_ordered() to give them an encoding for each unique value

  2. Dataloading
    Categorical variables
    Convert the codes of the categorical variables and add 1 to them so all na values go from -1 to 0, this will enable embedding to work.

  3. Training
    After loss.backward is called, the learning rate and momentum of the optimizer’s param groups are updated as defined by the cycle policy, namely increasing lr while decreasing mom for the first part and decreasing lr while increasing mom for the second part whose values are determined by cosine annealing. the optimizer’s param data is subsequently updated with p.data.mul(1- wd * lr) before opt.step() is called followed by opt.zero().

Turns out the answer is to ‘cheat’ not on the validation set, but the training set. After splitting the data up into training/validation sets, I proceeded to permutate the training data only and that works wonders. I did not look into the cpu device loader class in fastai, but i surmise some randomization is going on as the labels that are loaded for training in data.train_dl is always of a different order.

1 Like