After seeing Jeremy’s first two machine learning videos on Random Forest I wanted to try and apply it on a different Kaggle dataset.
I found the Instacart market basket analysis problem in kaggle and wanted to solve it using Random Forest
However the dataset for the problem has multiple data files for Order, Department, Product and aisles.
Can you tell me if I need to first combine all these files using common key before solving it using Random Forest.
Also the order column has 3 million rows. Can anyone provide me insight on how good Python is for combining such big files.