Lesson 09_tabular: cond = (df.saleYear<2011) | (df.saleMonth<10)

Question about the following cell (approximately the 21st cde cell):

cond = (df.saleYear<2011) | (df.saleMonth<10)
train_idx = np.where( cond)[0]
valid_idx = np.where(~cond)[0]

splits = (list(train_idx),list(valid_idx))

How is the condition:

cond = (df.saleYear<2011) | (df.saleMonth<10)

grabbing everything before October 2011? Is this a mistake?
The way I read it is: cond = (df.saleYear<2011) OR (df.saleMonth<10) , which means that January 2012 for instance satisfies this condition and would be part of the training set (which we do not want).
Am I mistaken here? Is the | in the condition not an OR?


You are not mistaken. It’s a bug. An issue is open at fastbook repo.

1 Like
1 Like