Memory Leak: pandas.read_json(.)

Borz · November 13, 2017, 6:54am

(Jupyter Notebook) Pandas doesn’t seem to free memory allocated when reading in a .json file, either when a variable is being redefined or running del my_variable.

Put another way, hitting ctrl-return repeatedly on

my_var = pd.read_json('data.json')

in Jupyter will increase RAM usage by that variable’s amount each time, instead of freeing the memory used by my_var and re-allocating it.

This issue doesn’t come up if reading in using the json library. I haven’t tested this on different file types.

This is running on Ubuntu Linux 16.04. Pandas vsn: 0.20.3, Jupyter vsn 1.0.0, IPyKernel vsn: 4.6.1, IPython vsn 6.1.0

Maybe my packages need updating – wondering if anyone can replicate this in the meantime. Working on the Kaggle iceberg dataset is where it came up.