I read a great introduction to gradient boosting. I sort of understand how it works, but I am struggling with the draft #3 paragraph, specifically:

(sorry, would have copied this but don’t know how to type latex on the forums…)

Do we just back propagate the error to our parameters? How can you take a step ‘in the direction’ of a gradient of the loss function with respect to the prediction values if what you have as a starting point that produced the loss is a regression tree?!