Kaggle Elo merchant recommendation competition

nikhil.ikhar · December 18, 2018, 5:57pm

Hi,

I m working on elo merchant recommendation competition.
I trying to replicate rossman notebook

I have merged multiple diff csv file to bring all relations in train & test df.

I getting inf value in rmse calculation

image.png770×374 18.7 KB
Im not sure why learn.predict(test) is failing

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3077             try:
-> 3078                 return self._engine.get_loc(key)
   3079             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: 'feature_1'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-39-af36fc1167bc> in <module>()
----> 1 learn.predict(test)

/opt/conda/lib/python3.6/site-packages/fastai/basic_train.py in predict(self, item, **kwargs)
    249         "Return prect class, label and probabilities for `item`."
    250         self.callbacks.append(RecordOnCPU())
--> 251         batch = self.data.one_item(item)
    252         res = self.pred_batch(batch=batch)
    253         pred = res[0]

/opt/conda/lib/python3.6/site-packages/fastai/basic_data.py in one_item(self, item, detach, denorm)
    146         "Get `item` into a batch. Optionally `detach` and `denorm`."
    147         ds = self.single_ds
--> 148         with ds.set_item(item):
    149             return self.one_batch(ds_type=DatasetType.Single, detach=detach, denorm=denorm)
    150 

/opt/conda/lib/python3.6/contextlib.py in __enter__(self)
     79     def __enter__(self):
     80         try:
---> 81             return next(self.gen)
     82         except StopIteration:
     83             raise RuntimeError("generator didn't yield") from None

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in set_item(self, item)
    464     def set_item(self,item):
    465         "For inference, will replace the dataset with one that only contains `item`."
--> 466         self.item = self.x.process_one(item)
    467         yield None
    468         self.item = None

/opt/conda/lib/python3.6/site-packages/fastai/data_block.py in process_one(self, item, processor)
     72         if processor is not None: self.processor = processor
     73         self.processor = listify(self.processor)
---> 74         for p in self.processor: item = p.process_one(item)
     75         return item
     76 

/opt/conda/lib/python3.6/site-packages/fastai/tabular/data.py in process_one(self, item)
     44     def process_one(self, item):
     45         df = pd.DataFrame([item,item])
---> 46         for proc in self.procs: proc(df, test=True)
     47         if len(self.cat_names) != 0:
     48             codes = np.stack([c.cat.codes.values for n,c in df[self.cat_names].items()], 1).astype(np.int64) + 1

/opt/conda/lib/python3.6/site-packages/fastai/tabular/transform.py in __call__(self, df, test)
     30         "Apply the correct function to `df` depending on `test`."
     31         func = self.apply_test if test else self.apply_train
---> 32         func(df)
     33 
     34     def apply_train(self, df:DataFrame):

/opt/conda/lib/python3.6/site-packages/fastai/tabular/transform.py in apply_test(self, df)
     49     def apply_test(self, df:DataFrame):
     50         for n in self.cat_names:
---> 51             df.loc[:,n] = pd.Categorical(df[n], categories=self.categories[n], ordered=True)
     52 
     53 FillStrategy = IntEnum('FillStrategy', 'MEDIAN COMMON CONSTANT')

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2686             return self._getitem_multilevel(key)
   2687         else:
-> 2688             return self._getitem_column(key)
   2689 
   2690     def _getitem_column(self, key):

/opt/conda/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2693         # get column
   2694         if self.columns.is_unique:
-> 2695             return self._get_item_cache(key)
   2696 
   2697         # duplicate columns & possible reduce dimensionality

/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   2487         res = cache.get(item)
   2488         if res is None:
-> 2489             values = self._data.get(item)
   2490             res = self._box_item_values(item, values)
   2491             cache[item] = res

/opt/conda/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   4113 
   4114             if not isna(item):
-> 4115                 loc = self.items.get_loc(item)
   4116             else:
   4117                 indexer = np.arange(len(self.items))[isna(self.items)]

/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3078                 return self._engine.get_loc(key)
   3079             except KeyError:
-> 3080                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   3081 
   3082         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

KeyError: 'feature_1'

feature_1 is a category variable & it is present inside test df. But here is failing to find feature_1.

My notebook https://www.kaggle.com/nikhilikhar/elo-fastai-pytorch?scriptVersionId=8606358

zw1991 · December 19, 2018, 3:40pm

Hi,
For #1, I use the following function for RMSE.

def rmse(pred:FloatTensor, targ:FloatTensor):
    "RMSE between `pred` and `targ`."
    assert pred.numel() == targ.numel(), "Expected same numbers of elements in pred & targ"
    if len(pred.shape)==2: pred=pred.squeeze(1)
    var = (targ - pred)
    return torch.sqrt((var**2).mean())

For #2, try this:

predictions = learn.get_preds(ds_type='Test')
predictions = predictions[0].numpy()

Btw, how can I format my code as a code block?

Hope that helps!

nikhil.ikhar · December 20, 2018, 4:25am

Thanks @zw1991 now It is working. Although score is not good.

I think your code well formatted. But if you still face the issue, you can use Markdown syntax to format your code.

zw1991 · December 20, 2018, 9:44am

You’re welcome. Ya I figured the formatting out but I forgot to erase my question, haha. Thanks.

I also started to experiment with this Elo competition. My score is also not good when I submit my predictions. I am trying to figure out the reason.

nikhil.ikhar · December 20, 2018, 4:54pm

Gr8. we can help each other.