Lesson 5 In-Class Discussion

(Jona) #209

Can someone please confirm that the bug reported on Dec '17 by @Chris_Palmer was fixed? I still have exactly the same so it would help. Thank you!

(Anubhav Maity) #210

Difference between sub sampling and bootstrapping
With reference to the following code discussed in lesson 5, the code is doing sampling with replacement. We may get a row which we have received for a previous decision tree. Why then it is not bootstrapping?

def create_tree(self):
rnd_idxs = np.random.permutation(len(self.y))[:self.sample_sz]
return DecisionTree(self.x.iloc[rnd_idxs], self.y[rnd_idxs], min_leaf=self.min_leaf)

Where I am going wrong in understanding the difference between sub-sampling and bootstrapping?

(Joseph Catanzarite) #211

If your data set consists of N rows (examples), a bootstrap sample is a set of N samples with replacement, while a subsample is a set of fewer than N samples.