In this notebook, Jeremy writes:
We’ll create dummy variables for
Pclass
, even although it’s numeric, since the numbers1
,2
, and3
correspond to first, second, and third class cabins - not to counts or measures that make sense to multiply by.
He also explains it in the video lesson.
When dealing with Embarked
, which is C, Q or S, Jeremy says that coding C,Q,S to 0,1,2 or 2,0,1 or 1,0,2 or any of the 6 possible combinations won’t make a difference, because there is no order between C, Q and S. And it doesn’t make sense to add or multiply those values. It’s just a way to convert it to numbers. I agree with that, it makes sense.
But when dealing with Pclass
, I don’t get it.
Passenger class is an ordered concept from 1 to 3, just like Parch (number of parents or children aboard) is an ordered concept from 0 to 6.
Why would it make sense to add/multiply things like Parch, but not Pclass? Someone having 2 parents/children onboard has more parents/children onboard than someone with Parch 1 and less parents/children onboard than someone with Parch 3. There is definitely an order, that is why Jeremy left it as numerical variable.
But in the same way, someone in class 2 has a higher class ticket than someone in class 1 and a lower class ticket than someone in class 3, that is definitely also an ordered concept, so why not using it as-is, as a number?
Can someone explain what I’m missing?