So, I played around with it for an hour with the official docker image. In this hour, I have only unsuccessfully tested the read_csv function.
My impression is that this is still very much alpha and I will be wasting much more time preparing and bug tracking than I will currently save due to the possible GPU acceleration. The error messages are quite unhelpful as you can see below, I cannot even get the simple csvs to load. Of course that might be me, but with pandas all of this is a oneliner that simply works. Documentation is also lacking, I can’t even find an official list of the available datatypes. Another little thing: I am currently not able to use pathlib.Path for the file access (Error because they expect the filepath to be of type string apparently).
One observation: Currently the read_csv function alone lacks many of the features that pandas has, but that is to be expected. But one of the most annoying missing features is the automatic datatype inference. With gdf you have to specify the column names and column datatypes by hand, which means you first have to load everything with pandas to make this a semi automated process (unless you have only a small number of cols and want to do this by hand). While I might do the dtype-specification for pandas sometimes as a speed optimization, this is very annoying to have to be done generally. I have read though that this feature is in the pipelines.
So, I am stopping my experiments for now. Happy to see if any of you get some more interesting results or hints on what my problems could be.
If someone wants to try, this is the way I semiautomatically created the colums-type lists (maybe there is a bug here?)
df_raw = pd.read_csv(p/'ts_testdata.csv', nrows=10)
typedf = pd.DataFrame(index=df_raw.columns)
typedf['type'] = None
for col in df_raw.columns:
typedf['type'].loc[col] = str(df_raw[col].dtype)
typedf.loc[typedf['type']=='object', 'type'] = 'category'
typedf.loc['time'] = 'date'