Working though the Titanic competition on Kaggle and noticed that most submissions bin fields like ‘Age’, creating 4-5 buckets of various age ranges to use for classification in lieu of the 90 or so different values for Age.
My question is: How do you determine the size of your bins?
I haven’t come across any discussion of this in the Titanic notebooks and it seems like most authors are taking a best guess approach after looking at how Age affects Survival. Is that how’s its done? Or is there a statistical approach that can/should be taken instead?