Problem with Pandas running Lesson 3


(Maxwell McKinnon) #1

As no one else is having this issue (did a forum search and stack overflow search), it must be a setup issue. I did a conda env update and git pull. Both are up to date. Surely some similar setup issue must still be the issue as lesson 3 seems to be working fine for others here - searching this error brings up nothing on these forums.

pip list | grep pandas
pandas 0.23.0
pandas-summary 0.0.41
sklearn-pandas 1.6.0

Any ideas?

AttributeError: module ‘pandas.core.common’ has no attribute ‘is_numeric_dtype’

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-5fc2f9e3ed3e> in <module>()
----> 1 DataFrameSummary(tables[4])

~/anaconda3/envs/fastai/lib/python3.6/site-packages/pandas_summary/__init__.py in __init__(self, df)
     25         self.df = df
     26         self.length = len(df)
---> 27         self.columns_stats = self._get_stats()
     28         self.corr = df.corr()
     29 

~/anaconda3/envs/fastai/lib/python3.6/site-packages/pandas_summary/__init__.py in _get_stats(self)
     87         # settings types
     88         stats['types'] = ''
---> 89         columns_info = self._get_columns_info(stats)
     90         for ctype, columns in columns_info.items():
     91             stats.ix[columns, 'types'] = ctype

~/anaconda3/envs/fastai/lib/python3.6/site-packages/pandas_summary/__init__.py in _get_columns_info(self, stats)
    109                                         self.EXCLUDE,
    110                                         column_info['constant'].union(column_info['bool']))
--> 111         column_info[self.TYPE_NUMERIC] = pd.Index([c for c in rest_columns
    112                                                    if common.is_numeric_dtype(self.df[c])])
    113         rest_columns = self.get_columns(self.df[rest_columns], self.EXCLUDE, column_info['numeric'])

~/anaconda3/envs/fastai/lib/python3.6/site-packages/pandas_summary/__init__.py in <listcomp>(.0)
    110                                         column_info['constant'].union(column_info['bool']))
    111         column_info[self.TYPE_NUMERIC] = pd.Index([c for c in rest_columns
--> 112                                                    if common.is_numeric_dtype(self.df[c])])
    113         rest_columns = self.get_columns(self.df[rest_columns], self.EXCLUDE, column_info['numeric'])
    114         column_info[self.TYPE_DATE] = pd.Index([c for c in rest_columns

AttributeError: module 'pandas.core.common' has no attribute 'is_numeric_dtype'

Two errors with the Rossman lesson (Lesson 3)
(Raynier van Egmond) #2

Hi Maxwell - no solution… just to mention I started lesson 3 and have the same issue… Did you find a solution yet…?


(Nick) #3

It looks like pandas.core.common.is_numeric_dtype was removed in 0.23.
It works ok in 0.22. So you can either downgrade to 0.22 or update pandas-summary as it has been fixed few days ago https://github.com/mouradmourafiq/pandas-summary/commit/2a05d97be8e97f50221fb0174bcca8661e187a35


(Raynier van Egmond) #5

A simple downgrade using ‘conda install pandas=0.22’ seems to work on the particular issue of the pandas libraries where pandas=0.23 causes the code to crash. With the new pandas=0.23 there was also an issue later in the notebook where the use of AfterStateHoliday’ andBeforeStateHoliday` resulted in NaNs where they were not expected causing another crash. (original cell 68-ish)

The downgrade seems to have also solved that problem since the code runs through the notebook into a past the deep learning sections.

It looks like the downgrade as suggested by Nick - “bny6613” solves all the issue in the lesson 3 notebook - at least for me …

Thanks a million for this solution - hard to solve be noobs like myself.


(Stas Bekman) #6

This fix hasn’t been released yet as a package, only in a git branch, which can be installed as:

pip install -e git+https://github.com/mouradmourafiq/pandas-summary#egg=pandas-summary

instead of downgrading pandas.

edit: this will work if you use pip for installing python packages. if you use conda try: https://stackoverflow.com/a/50141879/9201239 and let us know whether this worked.


(Khoo) #7

Appreciate this thread on dl1 Lesson 3 Rossman
Attempted Stas Bekman’s
pip install -e git+https://github.com/mouradmourafiq/pandas-summary#egg=pandas-summary

Did not work, then tried downgrade
conda install pandas=0.22

Did not work either.
Use notepad open environment.yml and noticed panda summary==0.22
Copied that line to replace environment-cpu.yml
Close and restart anaconda. cd fastai, activate fastai-cpu then instinctively pip install Stas’ again. There after it worked.

In case anyone had problem with PATH, I tried this in July 2018 :
PATH = Path("data/rossman/")
tables = [pd.read_csv(f'{PATH}/{fname}.csv', low_memory=False) for fname in table_names]

Thank you mmcki RaynierX and Stas Bekman


(NanduRajendran) #8

Hi Stas,
I am also getting similar kind of error as shown below for Lesson 3 Rossmann dataset

TypeError Traceback (most recent call last)
in ()
----> 1 for t in tables: display(DataFrameSummary(t).summary())

/usr/local/lib/python3.6/dist-packages/pandas_summary/init.py in init(self, df)
25 self.df = df
26 self.length = len(df)
—> 27 self.columns_stats = self._get_stats()
28 self.corr = df.corr()
29

/usr/local/lib/python3.6/dist-packages/pandas_summary/init.py in _get_stats(self)
83 counts.name = ‘counts’
84 uniques = self._get_uniques()
—> 85 missing = self._get_missing(counts)
86 stats = pd.concat([counts, uniques, missing], axis=1, sort=True)
87

/usr/local/lib/python3.6/dist-packages/pandas_summary/init.py in _get_missing(self, counts)
101 perc = (count / self.length).apply(self._percent)
102 perc.name = ‘missing_perc’
–> 103 return pd.concat([count, perc], axis=1, sort=True)
104
105 def _get_columns_info(self, stats):

TypeError: concat() got an unexpected keyword argument ‘sort’

I tried running this command you said and also
import pandas_summary
pandas_summary.file
pip install -e git+https://github.com/mouradmourafiq/pandas-summary#egg=pandas-summary

but its still not working .can you help me out?

Regards
Nandu


(Stas Bekman) #9

Hmm, I see that pandas_summary with that fix has been released as 0.0.5. https://pypi.org/project/pandas-summary/
So you should be able to install it with just:

pip install -U pandas
pip install -U pandas_summary

I have pandas 0.23.3 and pandas_summary 0.0.5 and it works. I have just re-run the notebook to double check.


(Andrew Marshall) #10

Hi all, I had the same problem and fixed it by downgrading pandas. I am using a conda env for fastai.

conda activate fastai
conda install pandas==0.22

Good luck!


(NanduRajendran) #11

Thank you so much stas.its working:slightly_smiling_face::smiley: now.But towards the end of lesson i have encountered another problem while executing the below line of code.

m = md.get_learner(emb_szs, len(df.columns)-len(cat_vars),
0.04, 1, [1000,500], [0.001,0.01], y_range=y_range)
lr = 1e-3
i got the folowing error the following error

AssertionError Traceback (most recent call last)
in ()
1 m = md.get_learner(emb_szs, len(df.columns)-len(cat_vars),
----> 2 0.04, 1, [1000,500], [0.001,0.01], y_range=y_range)
3 lr = 1e-3

/usr/local/lib/python3.6/dist-packages/fastai/column_data.py in get_learner(self, emb_szs, n_cont, emb_drop, out_sz, szs, drops, y_range, use_bn, **kwargs)
76 def get_learner(self, emb_szs, n_cont, emb_drop, out_sz, szs, drops,
77 y_range=None, use_bn=False, **kwargs):
—> 78 model = MixedInputModel(emb_szs, n_cont, emb_drop, out_sz, szs, drops, y_range, use_bn, self.is_reg, self.is_multi)
79 return StructuredLearner(self, StructuredModel(to_gpu(model)), opt_fn=optim.Adam, **kwargs)
80

/usr/local/lib/python3.6/dist-packages/fastai/column_data.py in init(self, emb_szs, n_cont, emb_drop, out_sz, szs, drops, y_range, use_bn, is_reg, is_multi)
90 y_range=None, use_bn=False, is_reg=True, is_multi=False):
91 super().init()
—> 92 for i,(c,s) in enumerate(emb_szs): assert c > 1, f"cardinality must be >=2, got emb_szs[{i}]: ({c},{s})"
93 if is_reg==False: assert out_sz >= 2, “arg is_reg==False (classification) requires out_sz>=2”
94 self.embs = nn.ModuleList([nn.Embedding(c, s) for c,s in emb_szs])

AssertionError: cardinality must be >=2, got emb_szs[16]: (1,1)

and so I went to the following link where you encountered the same sort of issue


and as mentioned
i did
df = train[columns].append(test[columns]) and included
for i,(c,s) in enumerate(emb_szs): assert c > 1, f"cardinality must be >=2, got emb_szs[{i}]: ({c},{s})" in the class MixedInputModel(nn.Module):

and executing the below line of code its not working, learning rate finder stopped @0% and
training validation and exp rmse is showing NAn.can you suggest me whats to be done?
m.lr_find()

HBox(children=(IntProgress(value=0, description=‘Epoch’, max=1), HTML(value=’’)))

0%| | 0/2737 [00:00<?, ?it/s]

m.fit(lr, 3, metrics=[exp_rmspe])

HBox(children=(IntProgress(value=0, description=‘Epoch’, max=3), HTML(value=’’)))

epoch trn_loss val_loss exp_rmspe
0 nan nan nan
1 nan nan nan
2 nan nan nan


(Stas Bekman) #12

Can you make sure you use the latest version of the notebook, @Nandu . It should include this fix. Perhaps use git to do the update for you, in case you inserted the changes in the wrong place just to rule that out.

and in future posts please use the code markdown for when you post code sections (see </> entry in the Reply toolbar). Currently it’s very difficult to read the code sections in your post. Thank you.


(NanduRajendran) #13

Sure Stas.Thank you so much.will use the code markdown from now onwards

Regards
Nandu


(DenisTrofimov) #14

Hi!
The same error!

for t in tables: display(DataFrameSummary(t).summary())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-36-5ccf154bd414> in <module>()
----> 1 for t in tables: display(DataFrameSummary(t).summary())

/usr/local/lib/python3.6/dist-packages/pandas_summary/__init__.py in __init__(self, df)
     25         self.df = df
     26         self.length = len(df)
---> 27         self.columns_stats = self._get_stats()
     28         self.corr = df.corr()
     29 

/usr/local/lib/python3.6/dist-packages/pandas_summary/__init__.py in _get_stats(self)
     83         counts.name = 'counts'
     84         uniques = self._get_uniques()
---> 85         missing = self._get_missing(counts)
     86         stats = pd.concat([counts, uniques, missing], axis=1, sort=True)
     87 

/usr/local/lib/python3.6/dist-packages/pandas_summary/__init__.py in _get_missing(self, counts)
    101         perc = (count / self.length).apply(self._percent)
    102         perc.name = 'missing_perc'
--> 103         return pd.concat([count, perc], axis=1, sort=True)
    104 
    105     def _get_columns_info(self, stats):

TypeError: concat() got an unexpected keyword argument 'sort'

Have you fixed it?


(Joe) #15

Does anyone get this deprecation warning when running the lesson3 Rossman notebook?

<UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.>

The learner still seems to work but its difficult to view results int the cell.

I’m also still struggling with the pandas dataframe summary issue still.

Any suggestions welcome.


(Joe) #16

Just an update, downgrading pandas to 0.22 worked for we with the pandas_summary view


(Joe) #17

Final update on this!

Quick fix but to stop warnings disappearing I simply added in:

<import warnings
warnings.filterwarnings("ignore") >

I’m guessing there will be a fix for this in the future?


#18

I successfully followed this fix - https://stackoverflow.com/questions/50554428/exception-with-pandas-on-secondary-computer

You also need to change this 2 lines down:

change: if common.is_numeric_dtype(self.df[c])])
to: if types.is_numeric_dtype(self.df[c])])

It displays the table summaries but precedes it with a Pandas Future Warning.