TabularList.from_df runs very very slowly if pd.dtypes is np.float32

I use TabularList.from_df to construct my own tabular databunch. If the dtypes of ‘train_df’ and ‘test_df’ is float64, it runs about 15minutes, but if I change the dtypes of ‘train_df’ and ‘test_df’ to be np.float32, it runs several hours to construct the databunch. The code is as follows:
train_df: 2,010,000 * 1,140

since = time.time()
procs = [Categorify, Normalize]
test = TabularList.from_df(test_df, cat_names=cat_cols, cont_names=cont_cols)
data = (TabularList.from_df(train_df, cat_names=cat_cols, cont_names=cont_cols, procs=procs)
.split_by_idx(val_idx)
.label_from_df(cols=‘label’)
.add_test(test)
.databunch())
end = time.time()
print("elasped time: ", end - since)

I have the same problem. You can isolate the problem running the lines from the datablock separately.
For me, the line label_from_df is the line that runs super slow. I think that it is this line that runs the preprocessing.
Try:

tlist= TabularList.from_df(train_df, cat_names=cat_cols, cont_names=cont_cols, procs=procs)
stlist = tlist.split_by_idx(val_idx)

and then run:

llist = stlist.label_from_df(cols=‘label’)

Any idea @sgugger how to speed this? For reference I have a 400k items, with 10 cat cols and 1000 cont_cols

The architecture in v1 isn’t accessing things in dataframes smartly, so there is nothing we can do to speed this up AFAIK. v2 will read batches directly in the processed dataframed thus being much quicker.