Memory Error on Python Pandas Merge For Large Databases

Running into a memory error when I try to outer merge two databases that are big using the pythons pandas merge command. If anyone has suggestions for other commands to use, please do share.

Thanks

If your data is in a database, you should use SQL to merge and use pandas read_sql to read in what’s needed. Alternatively you can look at Dask (https://dask.pydata.org/en/latest/). I don’t have experience with it, just pointing you to the resources that might be useful.

Thanks @ramesh. Wasn’t a database as stated above, but a Pandas df.

Actually discovered a few kernels after Jeremy suggested to check them out in another thread on memory issues.
I’m new to Python/Pandas and the rest - so looking at Kernels has been super helpful with the memory issues on this dataset + in my understanding/learning of the same.

These two kernels were really helpful in particular
https://www.kaggle.com/jagangupta/memory-optimization-and-eda-on-entire-dataset
https://www.kaggle.com/kunalkotian/easily-load-train-csv-w-o-crash-save-feather-file

Thanks!

2 Likes