All you need is a Good Init, but not Input Normalize

Normalize input is highly expected step before feeding data to NN. But it is weird to apply a linear function before feeding data into a linear layer, isn’t it? Why is nobody fuse normalization into the first layer weights, as do with BN?

My motivation arose with a need to convert images from BGR to RGB (the case when you require to use OpenCV). I’ve figured out that on Edge devices normalization step might be quite time consuming.
Of course, since normalization parameters are not standardized, no one is safe from mistakes.

I wish to remedy the situation. Is it a good way to add extra flag in learner creation (next to pretrained) or not? Can you recommend something?

PS I’ve written rather dumb article on Medium relating to the topic. It can help. Maybe Medium is not the best choice.