Hi Fabian. I was hoping that someone would pick up on your question so that I could learn something new and also not expose a risky opinion. But I did not want to let your good question go without a reply. I have had the same question.
The short answer is that I don’t know why this design decision was made in fastai. I think that your idea of copying the weights and scaling them is entirely reasonable. You can find code examples in these forums that do exactly that. I would encourage you to try your idea and see how well it works vs. initializing with zeros. Maybe your idea of copying the weights will prove to be a better method that can be offered as an improvement.
My main issue with the fastai codebase is that you can read the code to find out what it does but not why. Which other designs were explored or tested to arrive at the many particular choices? Many options may have been evaluated, many years of personal experience drawn from, but you would never learn it from the code. So I do not think there is “something you are missing” - it’s just that you will not often find that “something” explained in the code itself.
Of course these are only my personal observations of the codebase, and I have a deep appreciation for the overall clarity and ease that fastai brings. Good luck with your project!
Thank you for your answer.
I agree with you. To use FastAI to its full potential it is essential to basically also understand most of the library itself. Otherwise, modifications are tough to pull off.
I generally read the source on Github (or within Jupyter notebook) for this.
Is there a better representation of the FastAI source code I could read?
I heard that it’s based on Jupyter notebooks itself. Maybe these notebooks have better comments with respect to the code?
Hi Fabian. I am really not very familiar with domain of fastai development. The fastai code is indeed developed and exported from Jupyter notebooks. I went to a couple of the source notebooks when I wanted more understanding of the code. There were unit tests, but no explanations of the reasoning and choices behind the designs. Sorry I can’t help further. Maybe someone else will have better advice.