I’ve tried to recreate the rossman example by taking reference from the fastai library but i don’t get the same results. i.e my validation_loss is 40+ at the end of one cycle. I only get it down to 0.01 if I cheat by randomizing the training sample to include all date ranges.
Below are the steps I gather from going through the fastai source code. Can someone point out if i’m missing something?
I’ve made sure that the neural network is exactly the same, i.e embedding sizes ,batchnorms, relu, dropouts etc.
I’ll be glad to share my code if required. (messy)

Preprocessing
Continuous variables
For each continuous variable with missing values, create a categorical variable that contains false for each missing value and true otherwise. Fill missing values with the median value.
Categorical variables
Call .astype(‘category’).cat.as_ordered() to give them an encoding for each unique value 
Dataloading
Categorical variables
Convert the codes of the categorical variables and add 1 to them so all na values go from 1 to 0, this will enable embedding to work. 
Training
After loss.backward is called, the learning rate and momentum of the optimizer’s param groups are updated as defined by the cycle policy, namely increasing lr while decreasing mom for the first part and decreasing lr while increasing mom for the second part whose values are determined by cosine annealing. the optimizer’s param data is subsequently updated with p.data.mul(1 wd * lr) before opt.step() is called followed by opt.zero().