This results in a training set that is OK but the validation and the test set have some data replaced with #na# when I know that there should be real data there.
Thanks for your feedback. I am not sure how this is the case as I am randomly splitting one dataframe so it should have identical categories. I have tried various combinations of categorical/continuous variables and stepping through but I cannot see the error. It is not important as it is just a toy example but I would have liked to understand for knowings sake.