Rossmann COmpetition

arjunrajkumar · November 24, 2017, 11:26am

Hey! I’m playing around with the Rossmann code.

I’ve trained the model but wondering how do I make the predictions on the test set?
Running TTA gives me an error.

Any suggestions on how to make predictions?

Thanks

Arjun

poppingtonic · November 24, 2017, 1:43pm

Do you mind posting a screenshot of the error you’re getting?

arjunrajkumar · November 24, 2017, 2:39pm

Hi Brian, Sure. Sorry about that. Was sending from phone.
Will check and send a screenshot. THanks!

arjunrajkumar · November 24, 2017, 5:12pm

Hi @poppingtonic , I trained the model, and am able to get predictions on the validation set.

On the test test, I tried running this

log_preds = m.predict(is_test=True)
But getting this error.

If you have any suggestions, please do let me know and will try it out…

Thank you!

memetzgz · November 25, 2017, 3:23pm

I’ve gotten this error when I have forgotten to specify where the test images are.

you’re supposed to specify it during the initial data specification, e.g.:

data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch, sz), test_name=‘test’)

You don’t have to redo everything to add the test name in after the fact.

I saved this from another thread, can’t find where so cannot properly give attribution, unfortunately:

To run predictions on test when you forgot to specify test

rerun data=… with test_name=‘test’ :
data = ImageClassifierData.from_csv(PATH, ‘train’, f’{label_csv}’, test_name=‘test’, num_workers=4, val_idxs=val_idxs, suffix=’.jpg’, tfms=tfms, bs=bs)
run : learn.set_data(data)
then, you can run : log_preds,y = learn.TTA(is_test=True)

ecdrid · November 25, 2017, 9:13pm

In order to correct this error…

Head over to the learn where you created the learner…

It’s missing the test_folder_name…

ecdrid · November 25, 2017, 9:14pm

Didn’t know about set_data()?
Thanks…
Wasted an hour training again…

arjunrajkumar · November 26, 2017, 1:36am

THanks @memetzgz

Going to try this out now.

Had a question:

log_preds,y = learn.TTA(is_test=True)

TTA is applying test time augmentation.
But as this is structured data - how can we apply TTA on this?
Isn’t TTA (showing different versions of the test set) for images and not structured data?

Edit: Jeremy has added the test set process to the notebook Structured Learner

pnvijay · November 30, 2017, 8:28am

I have been trying to understand the join_df function.

def join_df(left, right, left_on, right_on=None, suffix=’_y’):
if right_on is None: right_on = left_on
return left.merge(right, how=‘left’, left_on=left_on, right_on=right_on,
suffixes=("", suffix))

joined = join_df(joined, googletrend, [“State”,“Year”, “Week”])

what does [“State”,“Year”, “Week”] signify here? Is it left_on?

arjunrajkumar · November 30, 2017, 9:02am

[“State”,“Year”, “Week”] are the columns which are present on both joined & googletrend, based on which the merge happens.

Maybe this will help.

This is how weather looked before join_df.

This is how state_names looked before join_df
31 PM

This is the output file -

You can see the new columns added on the right after merge.

pnvijay · November 30, 2017, 9:05am

Thanks Arjun! I was doing the same that you have highlighted in the reply in my Jupyter notebook. I understand this now. But what are the reasons based on which these joins are happening? is it to create one big data frame that has all of the data in the various Csv sheets logically correlated together?

pnvijay · November 30, 2017, 9:18am

I am also getting this error when I run the rossmann notebook. Should I drop the index_col?

pnvijay · November 30, 2017, 9:30am

I removed index_col = 0 and went ahead. But now I am getting this error

Can I proceed even with this error?

arjunrajkumar · November 30, 2017, 9:37am

Exactly.

arjunrajkumar · November 30, 2017, 9:39am

You should check this thread - similar error :

pnvijay · November 30, 2017, 9:47am

Thanks Arjun! ‘AfterStateHoliday’ is a continuous variable. The error states that [‘AfterStateHoliday’]: Input contains NaN, infinity or a value too large for dtype(‘float32’). Not Sure how to handle this.

pnvijay · November 30, 2017, 9:54am

Looks like there is a ‘NaN’ in every row of the data frame AfterStateHoliday. Should we remove it from all the rows?

pnvijay · November 30, 2017, 12:27pm

I now understand where I have gone wrong. It is in using df the same for test and train for columns. will rectify that and check.

pnvijay · December 1, 2017, 6:40am

Thanks all. I completed my submission to the rossmann challenge and got a score of 0.10964 on public leaderboard and 0.12644 on private leaderboard.

jeremy · December 2, 2017, 5:21am

Very impressive