I thought I’d add an example. Let’s say we have two variables: male/female and age, and we want to predict if a person survives or not based on these two variables.
The model could be:
prediction = weight_male * male + weight_age * age + bias
Let’s say the older someone is, the more likely they were to survive the Titanic (not necessarily true but this is only a simple model). Let’s also say women had a higher chance to survive than men (“women and children first!”).
Since high age increases the likelihood of survival, weight_age is some positive number.
Since being male decreases the chance of survival, weight_male would be some negative number.
So maybe the model learns something like this:
prediction = -10 * male + 2 * age
where age is normalized to be between 0 and 1. (I left off the bias.)
If someone is 30 years and male, the score would be -10 + 60 = 50. If someone is 30 years old and female, the score would be 0 + 60 = 60. So in effect, there is a -10 penalty for being male in this model.
(Of course, to get a survival yes/no prediction, we need to turn this number into a probability, maybe using a sigmoid function. But that’s not important right now.)
What if female was encoded as 1 and male as 0? The model might now be:
prediction = 10 * female + 2 * age - 10
This time there is a bias (of -10), to penalize males. Again, a 30-year-old male would score 0 + 60 - 10 = 50, and a 30-year-old female would score 10 + 60 - 10 = 60.
So it doesn’t really matter whether we encoded male or female as 1 or 0, since the model can learn to deal with it either way.