May I know if there is any way to create an ImageDataBunch using a DataFrame that contains category labels and paths to a Google Cloud Storage bucket (gs://…)?
I wanted to try the exercise (retrain ResNet34) using my own dataset (~1 mil images, 70GB) but could not figure out how to do this using the fastai library. I ended up transferring my dataset into the VM/GCP instance and converting the paths to local paths.
Also, may I know if there are any suggestions as to what kind of hardware I should choose to train such a large dataset? I chose ‘n1-highmem-4 (4 vCPUs, 26 GB memory)’ and NVIDIA Tesla P4 but when I tried running the .fit_one_cycle() method for the CNN learner, it looked like it would take me at least 2 hours for each cycle (?), which seems really long. I don’t mind using more of my credits if I can get faster results, but I am not sure which options to choose. Are there any suggestions from more experienced people?
Hope I didn’t confuse any of the terminology or duplicate an already existing question (I did a brief search and didn’t see any relevant topics). Would appreciate any guidance, thanks in advance!
May I ask how do you transfer the dataset(I got ~10GB of image) into the VM/GCP instance from my local pc and converting the paths to local path? I’ve been struggling on it.
but then faced the same problem wxng faced, so had to work around it and put files from my local machine onto VM and then move from one VM directory to jupyter directory. In order to move all 8,000 + files at once, I zipped them first. So i uploaded a zipped file.
So i pretty much followed these instructions:
Then to move uploaded zip file i did:
sudo mv /home/YOURDIRECTORY/testfile.zip /home/jupyter/
with yourdirectory part popping up in your SSH transfer file window… Will be something like /home/XXXXXX
Once I moved my file I opened jupyter lab.
Started a new python file and executed below:
Found a much faster way to do above. First create cloud storage bucket. Name it. Put your zipped file there… and via SSHed connection to your VM follow this code
I was having the same problem: I had all my images in a GCS bucket and didn’t want to copy everything onto my vm because of the size. I found this very useful: