As my ubuntu home directory is running out of space and I have a 2TB secondary drive, I was wondering what the most efficient way was to have my training data for Kaggle competitions for example in my secondary drive, while running the ipython notebooks from my SSD?
Sorry, new-ish to python and ubuntu (6 months in and still learning!). Was going to post this in ‘Part1 v2 Beginner category’ but realize that it doesnt really adhere to the topic/guidelines there.
Its absolutely possible to store data anywhere on your system (including secondary disks etc). There are a few options as to how you can reference that remote data:
You can harcode the remote path in your notebook and be done with it PATH=/mnt/mybigdisk/planet-data
You can use something called softlink. This creates a convenient alias to your remote folder within your working directory (containing the notebook).
E.g., above, on first glance data appears to be a regular directory sitting in the same folder as my notebook. However, if you look closely, you will see something like data -> /mnt/data/truck/
This says that data is an alias to original /mnt/data/truck folder.
Now, there may be performance consequence for storing data on a SSD versus mechanical drives. But for our purposes here, it will be negligible at best.