Recently, I have been attempting to implement some popular ML algorithms from scratch in numpy. For instance, write the equations for a 2 layer NN, derive gradients in the backward pass and convert the code to numpy. I plan to move on to some popular techniques such as Batch Norm, Dropout etc.

I was wondering how one should effectively manage time between understanding math and converting them to vectorized code. Specifically, I realize that the implementation exercise often (re)introduced me to not only those familiar techniques such as broadcasting, bias trick but also handle practical issues (such as working with NaNs, dividing large numbers) that may occur when computing loss functions.

I understand that as practitioners we should put more time on understanding higher level APIs, but based on my discussions with other students, I felt that coding up vanilla implementations are equally important. Thoughts on how I can do this more effectively.

The question I ask myself: what bounds my ability to train better models and do interesting things? I donβt know a lot of math, but despite that for me the answer continues to be my ability to express what I want in code. So that is where I spend most of my time.

Here is an example of expressiveness:
Here Jeremy is preprocessing images used by VGG 16 and for that he has to subtract the mean intensity as well as reverse the RGB order. The brevity of the code is impressive, at least for me. Broadcasting and numpy methods are at work here.

Thanks for the note β this is a great example β code brevity is certainly one of the aspects we are aiming for. At the same time, I am trying to write raw numpy code to arrive at these numbers myself (sure, hardcoded numbers are available for popular datasets such as ImageNet). But, I would like to focus on how we derive these numbers theoretically and convert the math equations to vectorized code.