Thanks for sharing @neerjadoshi In the augmentation article, note that training time augmentation only helps when you train multiple epochs. In your case you’re just training one epoch, so I’d guess the improvement is just random noise. Also you need to unfreeze the network, otherwise over-fitting is very unlikely (and therefore augmentation doesn’t help).
I’d suggest trying a dataset that’s not quite as similar to imagenet, so needs more epochs to get good accuracy, and you should be able to show a more compelling difference then.
The initializations post is looking good. I think it would be helpful to show code examples of each concept you’re talking about; and where you do show code, put it in a code block, not a picture, so people can copy and paste it and try it out. Also, maybe show some experiments to show how it impacts training in practice?
A link to the papers that introduced each init method might be nice too.