Can't get the rossman_data_clean working

I got everything running up until the following lines of code:
for df in (joined,joined_test):
df["Promo2Since"] = pd.to_datetime(df.apply(lambda x: Week(
x.Promo2SinceYear, x.Promo2SinceWeek).monday(), axis=1).astype(pd.datetime))
df["Promo2Days"] = df.Date.subtract(df["Promo2Since"]).dt.days

Here is the error I got:

I am running ubuntu 18.04 on AWS. I see I can autocomplete pd.datetime just fine and get the documentation to show up but I still see that it doesn’t know what datetime.datetime is, I even did from datetime import datetime but that didn’t work either.


I hit this too. I think the solution is to remove the ‘.astype(pd.datetime)’ part. I haven’t gone through lesson 6 yet to confirm, but from reading up it seems like that is redundant as to_datetime should be performing the type conversion already.


This is correct. notebook needs updating

1 Like

I hit the same error and fixed the same way. Agree with @RogerS49, the notebook should be corrected. It made me take a long long look at the stores.csv file, however. Maybe not a bad thing.
Who is authorized to fix something like this?

hi, Mike, since you seem running the rossman nb. May I ask a silly question: I cant find the train_clean data from running the following code…Maybe there’s some change in directories but I poked around the dir and didnt find it, also didnt find on github… also didnt’ find the csvs (store/weather etc.) could you help? Many thanks.

path = Config().data_path()/‘rossmann/’
train_df = pd.read_pickle(path/‘train_clean’)

Just to be clear, I’m running 2018 part 1, lesson 3.
In that notebook, just after the second code block in the section called ‘Create datasets’, there is a link to If you are not using that zip file, anything I say next may not apply to your situation.

I navigated to c:/users/mike01/fastai/data (yep, I’m in Windows) and created a new folder, ‘rossmann’. I moved the tgz file there and unpacked it. There are 8 csv files including ‘store’ and ‘weather’, but no ‘train_clean’.

Since train_clean is not one of the unzipped files, it must be a file created during running of one of the code cells in your notebook. When I run my notebook, it creates 2 new folders and 3 new files in the rossmann directory. But I’m doing the 2018 version and it doesn’t seem to use a ‘train_clean’ file.

I hope this is clear. If not, please tell me. I’ll try to help.

1 Like

I also did this and it worked! Might encounter issues further ahead though haven’t checked, but it worked to start the dataset off.

And I downloaded the dataset but couldn’t find it. :confused:

Alternatively, following this, .dt.to_pydatetime() should be applied to the series instead of .astype(pd.datetime) to each element. For example:

joined["CompetitionOpenSince"] = pd.to_datetime(joined.apply(lambda x: datetime.datetime(
    x.CompetitionOpenSinceYear, x.CompetitionOpenSinceMonth, 15), axis=1)).dt.to_pydatetime()

works for me, while

joined["CompetitionOpenSince"] = pd.to_datetime(joined.apply(lambda x: datetime.datetime(
    x.CompetitionOpenSinceYear, x.CompetitionOpenSinceMonth, 15), axis=1).astype(pd.datetime))

gave me the same error as @myazhbin.

Note the position of the parentheses carefully.

I use pandas version 0.24.2.

Added note.
Eventually I realised that I was working with the wrong notebook, and in the current version on GitHub the .astype(pd.datetime) part as been removed, as you suggested.

This may help others.

While the (zipped) dataset is available at, there is also a notebook which looks like an old version of rossman_data_clean.ipynb (note that names are different).

Don’t get confused and pick the right notebook.

Added note. The name of the chain store Rossmann takes two n's, so there is a typo in the filename rossman_data_clean.ipynb.