Feature importance , redundant features

MahdiRezaei · March 24, 2019, 10:09am

Hi all

in ML course (2017)
lesson 3 - in rf_interpretation

i tried plotting hierarchical clustering without feature importance (finding the most effective independent features)
i mean instead of just keeping the 30 important columns (first image)
i used all columns (second image)

then i got this error “Distance matrix ‘X’ must be symmetric”

why is that ?

i’m not sure , but i think that error raised because of one or several columns has some problems
or because the number of columns are too many

@sgugger
@PierreO
@init_27

could you please answer this question?

full stack trace
{https://github.com/mahdirezaey/as/blob/master/forums.ipynb }

BBloggsbott · March 25, 2019, 2:58pm

Can you share the entire stacktrace?

MahdiRezaei · March 26, 2019, 7:26am

sure!
{https://github.com/mahdirezaey/as/blob/master/forums.ipynb}

BBloggsbott · March 26, 2019, 8:51am

This is my guess of what is happening. I’m not sure though.

I see a lot of invalid value encountered in the code block that raised the error. There are lot of non numeric data in your data frame. You could try encoding them as categorical data.

MahdiRezaei · March 26, 2019, 1:19pm

@BBloggsbott , i think proc_df will do that automatically

turning into categorical and
considering NaN as 0
filling missed in numeric by mean

i have update that (one_hot_encoded now)
could you please check this link again {https://github.com/mahdirezaey/as/blob/master/forums.ipynb}

and steel that error

BBloggsbott · March 26, 2019, 1:49pm

corr = np.round(scipy.stats.spearmanr(df_trn).correlation, 4)
Based on the docs scipy.stats.spearman returns a correlation matrix. You’ve used .correlation here. This might be because you use a different version.

P.S I haven’t done the fastai course or use the package before. So, I’m just pointing out the possible errors.

MahdiRezaei · March 26, 2019, 3:50pm

tanks anyway !

MohammedHasan · February 16, 2021, 7:31pm

Do you see the same error if you put labels=None. ? I am not sure whats going on. In my case I have faced the same error.