Given a dataframe with
tmp['date'].values
array([Timestamp('2015-09-01 17:00:00-0700', tz='US/Pacific'),
Timestamp('2015-09-02 17:00:00-0700', tz='US/Pacific'),
Timestamp('2015-09-03 17:00:00-0700', tz='US/Pacific'),
Timestamp('2015-09-07 17:00:00-0700', tz='US/Pacific'),
Timestamp('2015-09-08 17:00:00-0700', tz='US/Pacific')],
dtype=object)
If we run
add_datepart(tmp, 'date')
we get
tmp.dtypes
Year int64
Month int64
Week UInt32
Day int64
Dayofweek int64
Dayofyear int64
Is_month_end bool
Is_month_start bool
Is_quarter_end bool
Is_quarter_start bool
Is_year_end bool
Is_year_start bool
Elapsed object
dtype: object
The dtype of Elapsed
column generated by add_datepart
is object
(strings) and not quantitative in v2.0.7. Since Elapsed
represents Unix epoch timestamp, should the dtype instead be int64
, so that cont_cat_split
will identify Elapsed
as continuous? This behavior is the same if I change the column values from Timestamp
to str
.