As I’m going through and attempting to figure out some of the memory drawbacks of
fastai2, I’ve also figured out a few tricks that make memory usage much lower. I’m going to use this thread as a post with some of the tricks I’ve found.
- Note, some of these issues will hopefully be addressed/we’re in the process of working on them
You can reduce memory from numerical data (sometimes by 50%!) in your
DataFrame if you convert them all to a
float32 if possible. This can be as simple as:
for col in train.columns: if train[col].dtype == 'float64': train[col] = train[col].astype(np.float32)
What you’ll then wind up seeing is
TabularPandas will also reduce dynamically in space as well. For example, my particular dataframe has a footprint of ~1.2gb. After processing with
float64 it’s new memory is an added .6 gb. With the preprocessing it’s now at a whopping 1.3gb (so an added .1gb!)
Preprocessing your categorical data by making them into Category types can also reduce the memory. Just by how much?
Before preprocessing the added weight is ~1gb of memory.
If I do the following:
for name in cat_vars: train[name] = train[name].astype('category')
(this is on Rossmann), before calling
TabularPandas, it’s the same footprint (after performing the
If you are on the
TabularPandas has an option to use
inplace. This can be helpful for large dataframes as
TabularPandas will work off of it instead of a copy dataframe. To use this, first set the following:
Then when building your
For now, I hope this helps a few people as we address some of these memory issues in fastai tabular These tips (especially the numerical ones) is also great for pandas in general!