Encoded and continuous tabular data: which ItemList to use?

neevlu · March 7, 2019, 10:49pm

I have a dataset made up of one column of strings I’d like to encode and several columns of continuous values – I can’t figure out which data block type to use (trying MultiCategoryList and am unsure) or if that’s even the right approach.

About the data: Each row in the csv is a different molecule. The first column contains strings (example: ‘OC(=O)C=C’) that I want to encode (I’ve been looking into one-hot and autoencoding). The other columns are each continuous values. I want the model to predict the encoded value.

I’m unclear on whether I use MultiCategoryList on the whole dataframe or if it is only relevant to the column that needs to be encoded? Is there a better approach I should take? I would appreciate any help or advice! Thanks.