When we deal with Tabular Data, we use a TabularList and TabularLearner. We use a number of pre-processors such as FillMissing, Categorify and Normalize. Now, after creating a Tabular Learner as (taken from Lesson 6):-
learn = tabular_learner(data, layers=[1000,500], ps=[0.001,0.01], emb_drop=0.04, y_range=y_range, metrics=exp_rmspe)
and then printing out the model with:-
learn.model
we get this:-
TabularModel(
(embeds): ModuleList(
(0): Embedding(1116, 81)
(1): Embedding(8, 5)
(2): Embedding(4, 3)
(3): Embedding(13, 7)
(4): Embedding(32, 11)
(5): Embedding(3, 3)
(6): Embedding(26, 10)
(7): Embedding(27, 10)
(8): Embedding(5, 4)
(9): Embedding(4, 3)
(10): Embedding(4, 3)
(11): Embedding(24, 9)
(12): Embedding(9, 5)
(13): Embedding(13, 7)
(14): Embedding(53, 15)
(15): Embedding(22, 9)
(16): Embedding(7, 5)
(17): Embedding(7, 5)
(18): Embedding(4, 3)
(19): Embedding(4, 3)
(20): Embedding(9, 5)
(21): Embedding(9, 5)
(22): Embedding(3, 3)
(23): Embedding(3, 3)
)
(emb_drop): Dropout(p=0.04)
(bn_cont): BatchNorm1d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(layers): Sequential(
(0): Linear(in_features=233, out_features=1000, bias=True)
(1): ReLU(inplace)
(2): BatchNorm1d(1000, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): Dropout(p=0.001)
(4): Linear(in_features=1000, out_features=500, bias=True)
(5): ReLU(inplace)
(6): BatchNorm1d(500, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): Dropout(p=0.01)
(8): Linear(in_features=500, out_features=1, bias=True)
)
)
As seen, Embeddings are used for the Categorical Variables, while the Continuous Variables are first connected to a BatchNorm layer. Batch Normalization is used to somewhat normalize the activations, if I understand correctly i.e., they tend to bring the activations in a layer across muiltiple mini batches to a similar mean(Beta) and standard deviation(Gamma).
Since we have already normalized the continuous variables using the Normalize pre-processor, why do we need to again pass them through a BatchNorm and do a similar thing?