Lesson 7 Intro to ML building Random Forest


(Arun Vishwanathan) #1

I have a basic question about the code written under create_tree for building your own RF. I am not sure if this was explained earlier, but why does the professor mention that the sampling is random “WITHOUT replacement” in the video?
In other words for each tree, we call create_tree and each time we do np.random.permutation on a fixed range (len(self.y)). How does this ensure that it is happening without replacement? Would not we choose rows that were chosen for one tree possibly for another tree as well? In other words, the random indexes could overlap between some trees, correct? Did I misunderstand the code?