It depends on the machine learning algorithm you’re using. For a decision tree, it’s OK to encode categories using ordinal values (0, 1, 2, 3, etc). For an algorithm that learns a weight for each variable it’s not OK.

Let’s say we have a category `animal`

with three possible types: cow, goat, and pig. If we were to encode this as:

```
cow 0
goat 1
pig 2
```

then a decision tree could write rules such as:

```
if animal == 0 then
do cow stuff
else if animal == 1 then
do goat stuff
else if animal == 2 then
do pig stuff
```

So there’s no problem there.

However, let’s say we have a logistic regression classifier or a neural network. Now the algorithm learns something like this:

```
prediction = weight * animal + ... + bias
```

In this case, if the animal is a pig, the predicted value will be higher than if it is a goat, and much higher than if it were a cow. The same weight is used for three different things. So here it’s not a good idea to use ordinal values to encode the categories.

Instead, we want to use an encoding where the distance between cow, goat, and pig is equal:

```
cow [1, 0, 0]
goat [0, 1, 0]
pig [0, 0, 1]
```

This is one-hot encoding. Note that if you treat each of these as a vector, the distance between each pair of animals is always `sqrt(2)`

(for Euclidian or L2-distance), or `1`

(for L1-distance).

What the ML algorithm learns is now:

```
prediction = weight_cow * cow + weight_goat * goat + weight_pig * pig + ... + bias
```

Since only one of these at a time (`cow`

, `goat`

, or `pig`

) can ever be 1, only one weight gets used and we can learn a weight for each individual type of animal.

As I mentioned, we can actually leave out one of these categories:

```
cow [1, 0]
goat [0, 1]
pig [0, 0]
```

The absence of cow and goat implies the thing is a pig. The distance between cow and goat is still sqrt(2) (or 1 if you’re using L1-distance) but between cow and pig it is 1. The square root of 2 is slightly larger than 1, but close enough. Plus it probably won’t matter if you look at what the ML algorithm now learns:

```
prediction = weight_cow * cow + weight_goat * goat + ... + bias
```

Here, the pig does not have its own weight. Is that a problem? To be honest, I don’t understand the mechanics of this enough, but I guess the bias term plays a role here.

Anyway… your question was about male vs female. You could encode it as:

```
male [1, 0]
female [0, 1]
```

That would certainly work. The ML algorithm learns:

```
prediction = weight_male * male + weight_female * female + ... + bias
```

But let’s say you’re encoding at as male = 1, female = 0, then what the ML algorithm learns is this:

```
prediction = weight_male * male + ... + bias
```

This is fine, since it can assign a large (positive or negative) weight for when being male is important to the prediction, and a small weight but large bias for when being female is more important than being male.

Of course, in a real classifier the formula for the prediction is more complicated (it probably won’t make decisions based on just male/female but the combination of male/female with other features), but the point is that with just two categories, 1 and 0 are enough for the classifier to make a useful distinction. You could also use 1 and -1, or 100 and 0, or 100 and -100, as long as the two values are different.

I hope this makes sense.