Split_from_df() in fastai2

Hi everyone,

I’m trying to reproduce some results from fastai v1 in fastai2. I have a dataframe with 'is_valid' column and used to do split_from_df(col='is_valid') in fastai v1. How should I do this in fastai2? Many thanks.

Hey @MichaelScofield, I can’t quite answer your question as I’m not sure it exists yet (from my knowledge but I could be wrong!) But to get more visibility with this perhaps move this thread to the v2 subforum? (As it’s more relevant than v1) :slight_smile:

1 Like

Hi @muellerzr, of course I would. How can I do that?

Done :slight_smile: (you can hit the little pen next to the title of the thread eventually if you have a high enough trust ranking to do so)

1 Like

Thank you.

1 Like

What type of data? vision,text,tabular?

The from_df functionality seems to be in data specific files, such as: https://github.com/fastai/fastai_dev/blob/master/dev/09a_vision_data.ipynb

@sgugger pinging you in here as I looked too. Closest thing I can think of would be a FuncSplitter that looks at a column in the CSV?

1 Like

Yes for now. We’ll had the ColSplitter soon but it’s not there yet.

2 Likes

I thought about FuncSplitter too, tried it out but not quite understand how it works or what object the function applies to. From my view it seems to apply the function to the image_name, not the data frame. Oh I must have messed things up.
I really need your advices.

If you’re like me and you got to this post by searching “tabular split by column” then you probably want ColSplitter() from the data.transforms module:

For example the code I needed was:

splitter = ColSplitter('is_valid')(df)

tabular_object = TabularPandas(
    df, 
    procs = preprocessors,
    cont_names = continuous_vars,
    y_names = dep_var,
    splits = splitter
)
3 Likes

@MichaelScofield @muellerzr hi can you share how to add the testing set as my dataset is the same as a col with the name of is_valid.

If you have split in train, test, and valid please let me know. my code just loads the train and the valid loads test and valid both in a single valid databunch.
Thanks