I want to train upload my data into my current notebook. I tried putting it on filehosting sites and then tried downloading it viawget https://dl.dropboxusercontent.com/content_link/GjAxYko7UsfBU2Iio68DZ15qAOsWzVb20sKDyUh1yFfGFvQ1tR3EJeXfb2zio7Ia/file?dl=1 && unzip blackswhites.zip -d data
This doesn’t work. How can i get the data into my current environment?
Ideally I’d like to directory transfer data from Kaggle to Colab, without going through my machine, as the uplink on my home internet is very slow and the files are taking long time to get to Colab.
Actually I found a solution to Kaggle -> Colab direct transfer.
I used the CurlWget extension Jeremy describes here [https://youtu.be/9C06ZPF8Uuc?t=744] and then invoke !wget ... from Colab to download into my Google Drive. Extremely fast transfer all within Google Cloud.
There is the API key to Kaggle which makes the access straightforward, as if you were on Kaggle Kernels:
install Kaggle API: !pip install kaggle
API Credentials
To use the Kaggle API, go to the ‘Account’ tab of your user profile (https://www.kaggle.com//account) and select ‘Create API Token’. This will trigger the download of kaggle.json, a file containing your API credentials.
Place this file on your Google Drive anywhere.
With the next snippet you download your credentials to Colab and you can start using Kaggle API:
This article lists down steps for setting up fast.ai course on Colab, but similar steps can be done for any ML project. http://bit.ly/2DyqMYd
This automates env setup, dataset and code download on Colab.
This actually solves the problem that the environment in Colab is not persistent ( i.e. every time you come back you must to install the libraries and download the datasets again )
I don’t think you need to do so much. I’ve done lessons 1-3 on Colab and sure, many problems do occur in the process but once I figured it out for lesson 1, it wasn’t very hard. Besides, the troubleshooting helped me learn a lot.
It can be a bit intimidating for beginners and take up some extra time as well but it’s worth it.
Everyone on this thread is very helpful as well.
Did you try running lesson1_rxt50.ipynb notebook in Google Colab? I’m getting the following error:
FileNotFoundError: [Errno 2] No such file or directory: ‘/usr/local/lib/python3.6/dist-packages/fastai/weights/resnext_50_32x4d.pth’
I know that we need to download the weights.tgz file from http://files.fast.ai/models/weights.tgz but I don’t know exactly where to put it, as in paperspace/crestle/AWS we work with the notebook under the cloned repository folder, but in the case of Google Colab, the notebook is loaded from my Google Driver and not from the fastai local git repo.
As you know, every few hours, the kernel gets disconnected and local files removed. So you need to automatically backup the models during training to your Google Drive to save the progress and be able to re-use the model later. Here’s a script that helps us do that if you are using Keras: https://github.com/Zahlii/colab-tf-utils
Thank you for sharing. This is the first time I heard about Clouderizer. Have you try using it? Is there a guide somewhere which we can refer to to get started with Clouderizer + Google Colab? I couldn’t find it in their docs and blog posts. Thank you in advanced.