Picking the right test set?

Hi everyone, I had a quick question. I am working with sensory movements and classifying them. I have 6 movements gathered by 7 people. In generating my test set, I did three different ways:

Before feature engineering, all splits were carried out

  1. Chose the last 10% of the data for each individual person for each individual exercise.
  • Why I think its bad: Say the user is turning off the watch, or has an ‘anomaly’ in the last bit, due to the window size of movement is not very long (a minute or two at max)
  1. Chose a random subsample of 10% (continuously) from each person for each subset
  • Should the test set be random? I am unsure of this.
  1. Chose a subsample of 10% continuously ~75% into the data.
  • I feel like this is better as it’s middle of the road, and gets a good ‘representation’ of what the data looks like.
  1. Hold out one person for all exercises

The last one I did due to these are worn sensors, and there can be some time when the user tries to turn it off, etc. Which of these should I go for? The last gives the highest accuracy, but I’m unsure if it’s the ‘best’

Thanks for any and all insight :slight_smile:


  • Sub note, I split before feature engineering as there is feature engineering done with previous values, etc, and I did not want the training set to influence the test set too badly. (rolling means, etc)

** Sub-Sub note, I had another thought. Rachel mentions in Creating a good validation set that you should use the future as your test set. I am worried about the startup and end so I opted to go for something like this: training is 70% of the data following the first 10%, and the test is the following 10%. This will reduce the noise of startup and shut-down of sensor data gathering