Using RF for protein classification


(Lorenzo Fabbri) #1

Hi everybody. I am trying to apply RF to the following problem. I have a set of genes (N) and some of them (n) are also part of a putative network. I am able to create a matrix of features for each gene (actually, on the protein sequence): composition, short motifs, functional annotation, etc… Then I tried to use RF with this matrix in order to try and predict whether a gene belongs to the network or not.
The problem is that, no matter what features I use (all of them, some of them), I alway get the same score. Around 0.63 (OOB). To me it seems a little strange; moreover I was able to get 0.61 by just using one feature! Any suggestion?