Efficient data transfer to the cloud

Fellow deep learning practitioners,

I am currently trying to get a “large” training data on google drive so that I can refer to it in google colab. However, the process is awfully slow.

Have you found a way to efficiently (fast) transfer large dataset to google drive e.g. using other ways that the google drive UI ? I searched google a little bit but apart from FileZilla pro, I could not find a satisfying way. Plus FileZilla works with FTP and this is apparently not a fast way of dealing with files.

If the solution requires to move to more professional platforms (AWS, Paperspace, Google Cloud, …) so be it … But if there is a free way to achieve this, I am all ears.

Thank you for your support and your time !

Make sure your training data is a zip file.

2 Likes

To add to @kushaj, zip your files up, upload it to google drive, and then unzip them once you are in colab using the ZipFile library.

3 Likes

Simple, efficient. I like it. I did not think about that at all… Thanks for that !

Just to expand upon why zip works. When you upload images then google will index all the images thus increasing the upload time by a lot. But when you use zip google has to index only once.

2 Likes

Hey thanks for the technical explanation. Indeed, knowing the actual process helps.

1 Like

Also related to why uploading a zip is faster: it’s usually faster to upload 1 file of 1 GB than 1000 files of 1 MB. (or whatever the sizes…). Each file usually carries some overhead.