Lesson 3 - Planet Google Colab

While my model for the planet dataset is training have some time to write short manual for the Google Colab Planet data loading process.
Many student have a problem to download Planet Data to the server.
I do it with the following steps. May be it is long, but it works at least for me.
1.Start the notebook and at the top not forget to add the following

from google.colab import drive
drive.mount(’/content/gdrive’, force_remount=True)
root_dir = “/content/gdrive/My Drive/”
base_dir = root_dir + ‘fastai-v3/’

  1. then goes normal as in notebook and run that

path = Config.data_path()/‘planet’
path.mkdir(parents=True, exist_ok=True)
path

  1. Go to kaggle https://www.kaggle.com/c/planet-understanding-the-amazon-from-space/data

and download 2 files to your local machine.

  • train-jpg.tar
  • train_v2.csv
  1. then upload these 2 file to Google Colab. It have 2 options
    a) Directly via Upload button to the folder which have 2 subfolders
    –gdrive
    –sample data
    In my case it was extremely slow options for the bigger file of 600 meg. It take me for ages to wait. If in your case it is faster that is good for you. Before next step check that you file train-jpg.tar have size over 629 meg or something.
    b) Upload these 2 files to youd Google Drive and then move files to the main folder. It is a little bit tricky with mouse but you can do it :slight_smile:

As a final you have to have the following
–gdrive
–sample data

  • train-jpg.tar - check size it have to be over 629 meg
  • train_v2.csv
  1. Run these commands

! mv train-jpg.tar.7z {path}
! mv train_v2.csv.zip {path}

  1. Then run that cell. As you see I comment two download comands.

#! kaggle competitions download -c planet-understanding-the-amazon-from-space -f train-jpg.tar.7z -p {path}
#! kaggle competitions download -c planet-understanding-the-amazon-from-space -f train_v2.csv -p {path}
! unzip -q -n {path}/train_v2.csv.zip -d {path}

  1. Then run

! sudo apt install p7zip-full

  1. Then run standard cell:

! 7za -bd -y -so x {path}/train-jpg.tar.7z | tar xf - -C {path.as_posix()} - if have not any mistake congratulations you load all data you need.

Hope it help someone to overcome struggle with the planet data loading process.

2 Likes

Thanks for sharing this, it worked for me!

Thank you!

interesting but fast.ai have this dataset at lest part of that.
planet=untar_data(URLs.PLANET_TINY) or (URLs.PLANET_SAMPLE)

planet_tms = get_transforms(flip_vert=True,max_lighting=0.1,
max_zoom=1.05,max_warp=0.)

data = (ImageList.from_csv(planet,‘labels.csv’,folder=‘train’,suffix=’.jpg’,)
.split_by_rand_pct()
.label_from_df(label_delim=’ ')
.transform(planet_tms,size=128)
.databunch()
.normalize(imagenet_stats))

data.show_batch(rows=2,figsize=(10,10))

and we have the the same dataset without the problems. can someone check that that it is same dataset at least partly.

This works as well but its a fraction of the dataset. I wanted to have a feel of working with the full dataset from Kaggle.:slight_smile:

aha ok.
but it eliminate the pain with the dataset loads and reinvent the wheels.
anyway at least we have the options - full set with the different technics or just load part of the dataset via URLs.PLANET_SAMPLE - just one string.

thank you for posting this - this confirms my suspicion that the images cannot be downloaded per the instructions provided, for some reason.
i did start to download the large .tar file to my local system but when i saw it was 630MB, i stopped.
now i see that may be the only way to get that file into the training system.
i am using the google compute setup (not google colab) for anyone else who may see this post.