Can't get the rossman_data_clean working

myazhbin · February 3, 2019, 9:14pm

I got everything running up until the following lines of code:
for df in (joined,joined_test):
df["Promo2Since"] = pd.to_datetime(df.apply(lambda x: Week(
x.Promo2SinceYear, x.Promo2SinceWeek).monday(), axis=1).astype(pd.datetime))
df["Promo2Days"] = df.Date.subtract(df["Promo2Since"]).dt.days

Here is the error I got:

I am running ubuntu 18.04 on AWS. I see I can autocomplete pd.datetime just fine and get the documentation to show up but I still see that it doesn’t know what datetime.datetime is, I even did from datetime import datetime but that didn’t work either.

peterwalkley · February 4, 2019, 2:53pm

I hit this too. I think the solution is to remove the ‘.astype(pd.datetime)’ part. I haven’t gone through lesson 6 yet to confirm, but from reading up it seems like that is redundant as to_datetime should be performing the type conversion already.

RogerS49 · February 15, 2019, 8:23pm

This is correct. notebook needs updating

mike00 · April 23, 2019, 11:01pm

I hit the same error and fixed the same way. Agree with @RogerS49, the notebook should be corrected. It made me take a long long look at the stores.csv file, however. Maybe not a bad thing.
Who is authorized to fix something like this?

angelinayy · April 26, 2019, 11:00pm

hi, Mike, since you seem running the rossman nb. May I ask a silly question: I cant find the train_clean data from running the following code…Maybe there’s some change in directories but I poked around the dir and didnt find it, also didnt find on github… also didnt’ find the csvs (store/weather etc.) could you help? Many thanks.

path = Config().data_path()/‘rossmann/’
train_df = pd.read_pickle(path/‘train_clean’)

mike00 · April 27, 2019, 12:25pm

@angelinayy
Just to be clear, I’m running 2018 part 1, lesson 3.
In that notebook, just after the second code block in the section called ‘Create datasets’, there is a link to files.fast.ai/part2/lesson14/rossmann.tgz. If you are not using that zip file, anything I say next may not apply to your situation.

I navigated to c:/users/mike01/fastai/data (yep, I’m in Windows) and created a new folder, ‘rossmann’. I moved the tgz file there and unpacked it. There are 8 csv files including ‘store’ and ‘weather’, but no ‘train_clean’.

Since train_clean is not one of the unzipped files, it must be a file created during running of one of the code cells in your notebook. When I run my notebook, it creates 2 new folders and 3 new files in the rossmann directory. But I’m doing the 2018 version and it doesn’t seem to use a ‘train_clean’ file.

I hope this is clear. If not, please tell me. I’ll try to help.

imago · April 27, 2019, 1:08pm

I also did this and it worked! Might encounter issues further ahead though haven’t checked, but it worked to start the dataset off.

AjayStark · June 21, 2019, 10:37am

@imago
And I downloaded the dataset but couldn’t find it.

Antoine.C · July 31, 2019, 5:00pm

Alternatively, following this, .dt.to_pydatetime() should be applied to the series instead of .astype(pd.datetime) to each element. For example:

joined["CompetitionOpenSince"] = pd.to_datetime(joined.apply(lambda x: datetime.datetime(
    x.CompetitionOpenSinceYear, x.CompetitionOpenSinceMonth, 15), axis=1)).dt.to_pydatetime()

works for me, while

joined["CompetitionOpenSince"] = pd.to_datetime(joined.apply(lambda x: datetime.datetime(
    x.CompetitionOpenSinceYear, x.CompetitionOpenSinceMonth, 15), axis=1).astype(pd.datetime))

gave me the same error as @myazhbin.

Note the position of the parentheses carefully.

I use pandas version 0.24.2.

Added note.
Eventually I realised that I was working with the wrong notebook, and in the current version on GitHub the .astype(pd.datetime) part as been removed, as you suggested.

Antoine.C · July 31, 2019, 5:10pm

This may help others.

While the (zipped) dataset is available at files.fast.ai/part2/lesson14/rossmann.tgz, there is also a notebook files.fast.ai/part2/lesson14/rossmann.ipynb which looks like an old version of rossman_data_clean.ipynb (note that names are different).

Don’t get confused and pick the right notebook.

Added note. The name of the chain store Rossmann takes two n's, so there is a typo in the filename rossman_data_clean.ipynb.