Add_elapsed_times grouping variables

karma · May 22, 2020, 12:03am

Hi All,

I’ve been trying to use the add_elapsed_times function for some time series data. I was under the impression that this should create elapsed days since the last event in a group (base_field).

The code has this line in it.
tmp = (work_df[[base_field] + field_names].sort_index(ascending=a)
.groupby(base_field).rolling(7, min_periods=1).sum())

However, if the base_field is not unique, I get a “Not found in index” error as the next line in the code tries to delete the grouping variable

tmp.drop(base_field,1,inplace=True)

For example, using this data:
import pandas as pd
import numpy as np

np.random.seed(0)
rng = pd.date_range(‘2015-02-24’, periods=5, freq=‘D’)
df0 = pd.DataFrame({ ‘Date’: rng, ‘Val’: np.random.randn(len(rng)) })
df0[“event”] = df0[“Val”]>1
df1, df2 = df0.copy(), df0.copy()
df1[“category”] = “A”
df2[“category”] = “B”
df0 = pd.concat([df1, df2], ignore_index=True)
df0[“idx”] = range(10)
df0.head(10)

dfx = add_elapsed_times(df0, [‘event’],
date_field=‘Date’, base_field=‘idx’)

This works as event is unique

dfx = add_elapsed_times(df0, [‘category’],
date_field=‘Date’, base_field=‘idx’)

This doesn’t work as category is not unique.

Is this expected behavior? I could not find any mention of it in the documentation.

Many Thanks
Karma

Add_elapsed_times grouping variables

Date | Val | event | category | idx