I am building a model to classify sick patients (0) versus non-sick patients (1) using their gene expression level. My data is very typical healthy care data which the number of variables are way greater than observations (each patients). I was managed to get the probability for each observation of my test sets. Where do I go from here?
Another question, although it seems like it is doing a great job predicting on the test sets by looking at the probabilities. However, I notice that in general there are many more sick patients than non-sick patients. Since the each class is sort of unbalanced, will this affect my model’s prediction on new data sets?