Lesson 3 - Planet Google Colab

Vlad_Tagunkov · May 21, 2020, 10:10am

While my model for the planet dataset is training have some time to write short manual for the Google Colab Planet data loading process.
Many student have a problem to download Planet Data to the server.
I do it with the following steps. May be it is long, but it works at least for me.
1.Start the notebook and at the top not forget to add the following

from google.colab import drive
drive.mount(’/content/gdrive’, force_remount=True)
root_dir = “/content/gdrive/My Drive/”
base_dir = root_dir + ‘fastai-v3/’

then goes normal as in notebook and run that

path = Config.data_path()/‘planet’
path.mkdir(parents=True, exist_ok=True)
path

Go to kaggle https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data

and download 2 files to your local machine.

train-jpg.tar
train_v2.csv

then upload these 2 file to Google Colab. It have 2 options
a) Directly via Upload button to the folder which have 2 subfolders
–gdrive
–sample data
In my case it was extremely slow options for the bigger file of 600 meg. It take me for ages to wait. If in your case it is faster that is good for you. Before next step check that you file train-jpg.tar have size over 629 meg or something.
b) Upload these 2 files to youd Google Drive and then move files to the main folder. It is a little bit tricky with mouse but you can do it

As a final you have to have the following
–gdrive
–sample data

train-jpg.tar - check size it have to be over 629 meg
train_v2.csv

Run these commands

! mv train-jpg.tar.7z {path}
! mv train_v2.csv.zip {path}

Then run that cell. As you see I comment two download comands.

#! kaggle competitions download -c planet-understanding-the-amazon-from-space -f train-jpg.tar.7z -p {path}
#! kaggle competitions download -c planet-understanding-the-amazon-from-space -f train_v2.csv -p {path}
! unzip -q -n {path}/train_v2.csv.zip -d {path}

Then run

! sudo apt install p7zip-full

Then run standard cell:

! 7za -bd -y -so x {path}/train-jpg.tar.7z | tar xf - -C {path.as_posix()} - if have not any mistake congratulations you load all data you need.

Hope it help someone to overcome struggle with the planet data loading process.

sedwardsmarsh · May 23, 2020, 10:18pm

Thanks for sharing this, it worked for me!

photons · June 3, 2020, 9:01pm

Thank you!

Vlad_Tagunkov · June 3, 2020, 9:37pm

interesting but fast.ai have this dataset at lest part of that.
planet=untar_data(URLs.PLANET_TINY) or (URLs.PLANET_SAMPLE)

planet_tms = get_transforms(flip_vert=True,max_lighting=0.1,
max_zoom=1.05,max_warp=0.)

data = (ImageList.from_csv(planet,‘labels.csv’,folder=‘train’,suffix=’.jpg’,)
.split_by_rand_pct()
.label_from_df(label_delim=’ ')
.transform(planet_tms,size=128)
.databunch()
.normalize(imagenet_stats))

data.show_batch(rows=2,figsize=(10,10))

and we have the the same dataset without the problems. can someone check that that it is same dataset at least partly.

photons · June 3, 2020, 10:31pm

This works as well but its a fraction of the dataset. I wanted to have a feel of working with the full dataset from Kaggle.

Vlad_Tagunkov · June 4, 2020, 7:10am

aha ok.
but it eliminate the pain with the dataset loads and reinvent the wheels.
anyway at least we have the options - full set with the different technics or just load part of the dataset via URLs.PLANET_SAMPLE - just one string.

davecampbell · August 4, 2020, 2:24am

thank you for posting this - this confirms my suspicion that the images cannot be downloaded per the instructions provided, for some reason.
i did start to download the large .tar file to my local system but when i saw it was 630MB, i stopped.
now i see that may be the only way to get that file into the training system.
i am using the google compute setup (not google colab) for anyone else who may see this post.