I understand what one hot encoding is, but I can’t for the life of me figure out why you’d want to do it. How does it help with anything? I’m all for the categorization of data such as creating a integer based dictionary of words in any text, where each integer represents a word, but why would you take this integer representation and blow it up into one hot encoding? To me, this just adds n amounts of pointless zeroes.

This isn’t a complete answer, but one reason is that label encoding, when passed to most algorithms, assumes that the order of the labels matter. This is a problem when, say, looking at days of the week. Tuesday isn’t twice as much anything as Thursday, but linear models can’t know that. This is less necessary with CART models (that can split out on specific values) and Neural Networks with embedding matrices (because each value gains it’s own mathematical representation.