Split_from_df() in fastai2

MichaelScofield · November 22, 2019, 6:44am

Hi everyone,

I’m trying to reproduce some results from fastai v1 in fastai2. I have a dataframe with 'is_valid' column and used to do split_from_df(col='is_valid') in fastai v1. How should I do this in fastai2? Many thanks.

muellerzr · November 22, 2019, 7:15am

Hey @MichaelScofield, I can’t quite answer your question as I’m not sure it exists yet (from my knowledge but I could be wrong!) But to get more visibility with this perhaps move this thread to the v2 subforum? (As it’s more relevant than v1)

MichaelScofield · November 22, 2019, 7:19am

Hi @muellerzr, of course I would. How can I do that?

muellerzr · November 22, 2019, 7:20am

Done (you can hit the little pen next to the title of the thread eventually if you have a high enough trust ranking to do so)

MichaelScofield · November 22, 2019, 7:20am

Thank you.

marii · November 22, 2019, 1:21pm

What type of data? vision,text,tabular?

The from_df functionality seems to be in data specific files, such as: https://github.com/fastai/fastai_dev/blob/master/dev/09a_vision_data.ipynb

muellerzr · November 22, 2019, 2:55pm

@sgugger pinging you in here as I looked too. Closest thing I can think of would be a FuncSplitter that looks at a column in the CSV?

sgugger · November 22, 2019, 3:30pm

Yes for now. We’ll had the ColSplitter soon but it’s not there yet.

MichaelScofield · November 22, 2019, 4:05pm

I thought about FuncSplitter too, tried it out but not quite understand how it works or what object the function applies to. From my view it seems to apply the function to the image_name, not the data frame. Oh I must have messed things up.
I really need your advices.

mattmoehr · January 4, 2021, 10:13pm

If you’re like me and you got to this post by searching “tabular split by column” then you probably want ColSplitter() from the data.transforms module:

For example the code I needed was:

splitter = ColSplitter('is_valid')(df)

tabular_object = TabularPandas(
    df, 
    procs = preprocessors,
    cont_names = continuous_vars,
    y_names = dep_var,
    splits = splitter
)

Alikhattak · October 14, 2021, 11:47am

@MichaelScofield @muellerzr hi can you share how to add the testing set as my dataset is the same as a col with the name of is_valid.

If you have split in train, test, and valid please let me know. my code just loads the train and the valid loads test and valid both in a single valid databunch.
Thanks