Problem with Pandas running Lesson 3

mmcki · May 27, 2018, 4:39pm

As no one else is having this issue (did a forum search and stack overflow search), it must be a setup issue. I did a conda env update and git pull. Both are up to date. Surely some similar setup issue must still be the issue as lesson 3 seems to be working fine for others here - searching this error brings up nothing on these forums.

pip list | grep pandas
pandas 0.23.0
pandas-summary 0.0.41
sklearn-pandas 1.6.0

Any ideas?

AttributeError: module ‘pandas.core.common’ has no attribute ‘is_numeric_dtype’

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-5fc2f9e3ed3e> in <module>()
----> 1 DataFrameSummary(tables[4])

~/anaconda3/envs/fastai/lib/python3.6/site-packages/pandas_summary/__init__.py in __init__(self, df)
     25         self.df = df
     26         self.length = len(df)
---> 27         self.columns_stats = self._get_stats()
     28         self.corr = df.corr()
     29 

~/anaconda3/envs/fastai/lib/python3.6/site-packages/pandas_summary/__init__.py in _get_stats(self)
     87         # settings types
     88         stats['types'] = ''
---> 89         columns_info = self._get_columns_info(stats)
     90         for ctype, columns in columns_info.items():
     91             stats.ix[columns, 'types'] = ctype

~/anaconda3/envs/fastai/lib/python3.6/site-packages/pandas_summary/__init__.py in _get_columns_info(self, stats)
    109                                         self.EXCLUDE,
    110                                         column_info['constant'].union(column_info['bool']))
--> 111         column_info[self.TYPE_NUMERIC] = pd.Index([c for c in rest_columns
    112                                                    if common.is_numeric_dtype(self.df[c])])
    113         rest_columns = self.get_columns(self.df[rest_columns], self.EXCLUDE, column_info['numeric'])

~/anaconda3/envs/fastai/lib/python3.6/site-packages/pandas_summary/__init__.py in <listcomp>(.0)
    110                                         column_info['constant'].union(column_info['bool']))
    111         column_info[self.TYPE_NUMERIC] = pd.Index([c for c in rest_columns
--> 112                                                    if common.is_numeric_dtype(self.df[c])])
    113         rest_columns = self.get_columns(self.df[rest_columns], self.EXCLUDE, column_info['numeric'])
    114         column_info[self.TYPE_DATE] = pd.Index([c for c in rest_columns

AttributeError: module 'pandas.core.common' has no attribute 'is_numeric_dtype'

RaynierX · May 27, 2018, 9:15pm

Hi Maxwell - no solution… just to mention I started lesson 3 and have the same issue… Did you find a solution yet…?

bny6613 · May 27, 2018, 10:58pm

It looks like pandas.core.common.is_numeric_dtype was removed in 0.23.
It works ok in 0.22. So you can either downgrade to 0.22 or update pandas-summary as it has been fixed few days ago https://github.com/mouradmourafiq/pandas-summary/commit/2a05d97be8e97f50221fb0174bcca8661e187a35

RaynierX · May 27, 2018, 11:45pm

A simple downgrade using ‘conda install pandas=0.22’ seems to work on the particular issue of the pandas libraries where pandas=0.23 causes the code to crash. With the new pandas=0.23 there was also an issue later in the notebook where the use of AfterStateHoliday’ andBeforeStateHoliday` resulted in NaNs where they were not expected causing another crash. (original cell 68-ish)

The downgrade seems to have also solved that problem since the code runs through the notebook into a past the deep learning sections.

It looks like the downgrade as suggested by Nick - “bny6613” solves all the issue in the lesson 3 notebook - at least for me …

Thanks a million for this solution - hard to solve be noobs like myself.

stas · June 12, 2018, 8:46pm

This fix hasn’t been released yet as a package, only in a git branch, which can be installed as:

pip install -e git+https://github.com/mouradmourafiq/pandas-summary#egg=pandas-summary

instead of downgrading pandas.

edit: this will work if you use pip for installing python packages. if you use conda try: https://stackoverflow.com/a/50141879/9201239 and let us know whether this worked.

Khoo · July 9, 2018, 8:34am

Appreciate this thread on dl1 Lesson 3 Rossman
Attempted Stas Bekman’s
pip install -e git+https://github.com/mouradmourafiq/pandas-summary#egg=pandas-summary

Did not work, then tried downgrade
conda install pandas=0.22

Did not work either.
Use notepad open environment.yml and noticed panda summary==0.22
Copied that line to replace environment-cpu.yml
Close and restart anaconda. cd fastai, activate fastai-cpu then instinctively pip install Stas’ again. There after it worked.

In case anyone had problem with PATH, I tried this in July 2018 :
PATH = Path("data/rossman/")
tables = [pd.read_csv(f'{PATH}/{fname}.csv', low_memory=False) for fname in table_names]

Thank you mmcki RaynierX and Stas Bekman

Nandu · August 1, 2018, 6:06am

Hi Stas,
I am also getting similar kind of error as shown below for Lesson 3 Rossmann dataset

TypeError Traceback (most recent call last)
in ()
----> 1 for t in tables: display(DataFrameSummary(t).summary())

/usr/local/lib/python3.6/dist-packages/pandas_summary/init.py in init(self, df)
25 self.df = df
26 self.length = len(df)
—> 27 self.columns_stats = self._get_stats()
28 self.corr = df.corr()
29

/usr/local/lib/python3.6/dist-packages/pandas_summary/init.py in _get_stats(self)
83 counts.name = ‘counts’
84 uniques = self._get_uniques()
—> 85 missing = self._get_missing(counts)
86 stats = pd.concat([counts, uniques, missing], axis=1, sort=True)
87

/usr/local/lib/python3.6/dist-packages/pandas_summary/init.py in _get_missing(self, counts)
101 perc = (count / self.length).apply(self._percent)
102 perc.name = ‘missing_perc’
–> 103 return pd.concat([count, perc], axis=1, sort=True)
104
105 def _get_columns_info(self, stats):

TypeError: concat() got an unexpected keyword argument ‘sort’

I tried running this command you said and also
import pandas_summary
pandas_summary.file
pip install -e git+https://github.com/mouradmourafiq/pandas-summary#egg=pandas-summary

but its still not working .can you help me out?

Regards
Nandu

stas · August 2, 2018, 4:28am

Hmm, I see that pandas_summary with that fix has been released as 0.0.5. https://pypi.org/project/pandas-summary/
So you should be able to install it with just:

pip install -U pandas
pip install -U pandas_summary

I have pandas 0.23.3 and pandas_summary 0.0.5 and it works. I have just re-run the notebook to double check.

AMarshall · August 5, 2018, 11:59pm

Hi all, I had the same problem and fixed it by downgrading pandas. I am using a conda env for fastai.

conda activate fastai
conda install pandas==0.22

Good luck!

Nandu · August 7, 2018, 6:34am

Thank you so much stas.its working:slightly_smiling_face: now.But towards the end of lesson i have encountered another problem while executing the below line of code.

m = md.get_learner(emb_szs, len(df.columns)-len(cat_vars),
0.04, 1, [1000,500], [0.001,0.01], y_range=y_range)
lr = 1e-3
i got the folowing error the following error

AssertionError Traceback (most recent call last)
in ()
1 m = md.get_learner(emb_szs, len(df.columns)-len(cat_vars),
----> 2 0.04, 1, [1000,500], [0.001,0.01], y_range=y_range)
3 lr = 1e-3

/usr/local/lib/python3.6/dist-packages/fastai/column_data.py in get_learner(self, emb_szs, n_cont, emb_drop, out_sz, szs, drops, y_range, use_bn, **kwargs)
76 def get_learner(self, emb_szs, n_cont, emb_drop, out_sz, szs, drops,
77 y_range=None, use_bn=False, **kwargs):
—> 78 model = MixedInputModel(emb_szs, n_cont, emb_drop, out_sz, szs, drops, y_range, use_bn, self.is_reg, self.is_multi)
79 return StructuredLearner(self, StructuredModel(to_gpu(model)), opt_fn=optim.Adam, **kwargs)
80

/usr/local/lib/python3.6/dist-packages/fastai/column_data.py in init(self, emb_szs, n_cont, emb_drop, out_sz, szs, drops, y_range, use_bn, is_reg, is_multi)
90 y_range=None, use_bn=False, is_reg=True, is_multi=False):
91 super().init()
—> 92 for i,(c,s) in enumerate(emb_szs): assert c > 1, f"cardinality must be >=2, got emb_szs[{i}]: ({c},{s})"
93 if is_reg==False: assert out_sz >= 2, “arg is_reg==False (classification) requires out_sz>=2”
94 self.embs = nn.ModuleList([nn.Embedding(c, s) for c,s in emb_szs])

AssertionError: cardinality must be >=2, got emb_szs[16]: (1,1)

and so I went to the following link where you encountered the same sort of issue

and as mentioned
i did
df = train[columns].append(test[columns]) and included
for i,(c,s) in enumerate(emb_szs): assert c > 1, f"cardinality must be >=2, got emb_szs[{i}]: ({c},{s})" in the class MixedInputModel(nn.Module):

and executing the below line of code its not working, learning rate finder stopped @0% and
training validation and exp rmse is showing NAn.can you suggest me whats to be done?
m.lr_find()

HBox(children=(IntProgress(value=0, description=‘Epoch’, max=1), HTML(value=’’)))

0%| | 0/2737 [00:00<?, ?it/s]

m.fit(lr, 3, metrics=[exp_rmspe])

HBox(children=(IntProgress(value=0, description=‘Epoch’, max=3), HTML(value=’’)))

epoch trn_loss val_loss exp_rmspe
0 nan nan nan
1 nan nan nan
2 nan nan nan

stas · August 7, 2018, 6:59am

Can you make sure you use the latest version of the notebook, @Nandu . It should include this fix. Perhaps use git to do the update for you, in case you inserted the changes in the wrong place just to rule that out.

and in future posts please use the code markdown for when you post code sections (see </> entry in the Reply toolbar). Currently it’s very difficult to read the code sections in your post. Thank you.

Nandu · August 7, 2018, 7:03am

Sure Stas.Thank you so much.will use the code markdown from now onwards

Regards
Nandu

DenisTrofimov · September 2, 2018, 9:42pm

Hi!
The same error!

for t in tables: display(DataFrameSummary(t).summary())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-36-5ccf154bd414> in <module>()
----> 1 for t in tables: display(DataFrameSummary(t).summary())

/usr/local/lib/python3.6/dist-packages/pandas_summary/__init__.py in __init__(self, df)
     25         self.df = df
     26         self.length = len(df)
---> 27         self.columns_stats = self._get_stats()
     28         self.corr = df.corr()
     29 

/usr/local/lib/python3.6/dist-packages/pandas_summary/__init__.py in _get_stats(self)
     83         counts.name = 'counts'
     84         uniques = self._get_uniques()
---> 85         missing = self._get_missing(counts)
     86         stats = pd.concat([counts, uniques, missing], axis=1, sort=True)
     87 

/usr/local/lib/python3.6/dist-packages/pandas_summary/__init__.py in _get_missing(self, counts)
    101         perc = (count / self.length).apply(self._percent)
    102         perc.name = 'missing_perc'
--> 103         return pd.concat([count, perc], axis=1, sort=True)
    104 
    105     def _get_columns_info(self, stats):

TypeError: concat() got an unexpected keyword argument 'sort'

Have you fixed it?

jberry1986 · September 4, 2018, 8:33am

Does anyone get this deprecation warning when running the lesson3 Rossman notebook?

<UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.>

The learner still seems to work but its difficult to view results int the cell.

I’m also still struggling with the pandas dataframe summary issue still.

Any suggestions welcome.

jberry1986 · September 4, 2018, 8:44am

Just an update, downgrading pandas to 0.22 worked for we with the pandas_summary view

jberry1986 · September 6, 2018, 1:24pm

Final update on this!

Quick fix but to stop warnings disappearing I simply added in:

<import warnings
warnings.filterwarnings("ignore") >

I’m guessing there will be a fix for this in the future?

lulstrup · October 18, 2018, 12:20am

I successfully followed this fix - https://stackoverflow.com/questions/50554428/exception-with-pandas-on-secondary-computer

You also need to change this 2 lines down:

change: if common.is_numeric_dtype(self.df[c])])
to: if types.is_numeric_dtype(self.df[c])])

It displays the table summaries but precedes it with a Pandas Future Warning.

ketzer · November 5, 2018, 3:23pm

It’s caused by: concat added sort argument in pandas 0.23

Add this lines after fastai modules installation:

!pip uninstall pandas -y && pip install pandas
!pip uninstall pandas-summary -y && pip install pandas-summary
!pip list | grep pandas

import pandas as pd
import pandas_summary

You need (pandas > 0.22)

pandas                   0.23.4     
pandas-summary           0.0.5

And rerun your env

Afef-33 · December 27, 2018, 5:33pm

I tried to run a script on spyder but I go this error message

File “C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\nanops.py”, line 23, in
import pandas.core.common as com

AttributeError: module ‘pandas’ has no attribute ‘core’

I tried to to uninstall pandas and install it but still have the same problem .
Help please !!