Proc_df not in Fast.ai v1?

Trying to create a model based on csv data using a the process that Jeremy uses in Intro to Machine learning: Lesson 1. Appears that the newer version of Fast.ai doesn’t include the structured import, and therefore doesn’t have the proc_df method for processing a data frame to convert categorical items into numbers.

Is there a newer process to use?

Thanks!

3 Likes

I had this problem too, but figured it out. Copy paste the structured.py code below from original Fastai into one of your cells and run. It has all the functions that no longer are supported in DL1 course from the ML course like proc_df, train_cats, apply_cats, etc.

Structured.py code: https://github.com/fastai/fastai/blob/master/old/fastai/structured.py

5 Likes

What are people using instead of proc_df in v1? It’s a critical function; is there no alternative/replacement?

2 Likes

@petulla Instead we pass in ‘Categorify’ as a proc in the fastai tabular modules. This does the above and generates an embedding matrix for the categorical variables.

Didn’t proc_df also handle NA values (and I’m not sure what else)? Is there a function to cover that processing?

FillMissing and Normalize are the other two :slight_smile: yes. I’d recommend lesson 4 of the Practical Deep Learning for Coders for a refresher on what’s new in that regards in the library.

2 Likes

It would be helpful to include changes that broke the API between 0.7 and 1.0.6 in the release notes changes doc. @jeremy

Sorry to reactivate this.
What if you want to use the preprocessing chain provided by fastai with another model like a random forest which does not take databunches as input? All examples I found online used proc_df and then random forest from sklearn.

1 Like

Try changing the model, you will find them with sklearn. But all in all, you have to first process the data with fastai, if you’re using fastai. And later, the model will anyway be followed with sklearn so try anymodel.
Also if I have interpreted your question correctly, proc_df is with the fastai only! You can use whatever model you wish depending upon what is the way it takes in the input.

Hi, himanni, can you be more specific about which model or function can replace fastai.structured or proc_df? :grinning: