How to split data at patient level instead of image level?

Hi,

I trained patient’s dicom files for classification problem. Each patient have a list of image files. During training, I found the validation accuracy is pretty high (to 99%). I realized that is because that the data is splitted based on images instead of patients, which leads to cheating.

How to split data at patient level instead of image level? Thanks!

You could have a look at this https://github.com/fastai/fastai/issues/2724

4 Likes