While my model for the planet dataset is training have some time to write short manual for the Google Colab Planet data loading process.
Many student have a problem to download Planet Data to the server.
I do it with the following steps. May be it is long, but it works at least for me.
1.Start the notebook and at the top not forget to add the following
from google.colab import drive
drive.mount(’/content/gdrive’, force_remount=True)
root_dir = “/content/gdrive/My Drive/”
base_dir = root_dir + ‘fastai-v3/’
- then goes normal as in notebook and run that
path = Config.data_path()/‘planet’
path.mkdir(parents=True, exist_ok=True)
path
and download 2 files to your local machine.
- train-jpg.tar
- train_v2.csv
- then upload these 2 file to Google Colab. It have 2 options
a) Directly via Upload button to the folder which have 2 subfolders
–gdrive
–sample data
In my case it was extremely slow options for the bigger file of 600 meg. It take me for ages to wait. If in your case it is faster that is good for you. Before next step check that you file train-jpg.tar have size over 629 meg or something.
b) Upload these 2 files to youd Google Drive and then move files to the main folder. It is a little bit tricky with mouse but you can do it
As a final you have to have the following
–gdrive
–sample data
- train-jpg.tar - check size it have to be over 629 meg
- train_v2.csv
- Run these commands
! mv train-jpg.tar.7z {path}
! mv train_v2.csv.zip {path}
- Then run that cell. As you see I comment two download comands.
#! kaggle competitions download -c planet-understanding-the-amazon-from-space -f train-jpg.tar.7z -p {path}
#! kaggle competitions download -c planet-understanding-the-amazon-from-space -f train_v2.csv -p {path}
! unzip -q -n {path}/train_v2.csv.zip -d {path}
- Then run
! sudo apt install p7zip-full
- Then run standard cell:
! 7za -bd -y -so x {path}/train-jpg.tar.7z | tar xf - -C {path.as_posix()} - if have not any mistake congratulations you load all data you need.
Hope it help someone to overcome struggle with the planet data loading process.