Lesson3 - Getting the data (Linux - Azure DSVM) - Solved

I had some problems getting the kaggle data in my Azure “Data Science Virtual Machine” over Linux. In case it is useful for someone, this is how I have solved it.

First, i had upload my credentials from Kaggle, and accepted the rules. I’m not sure, but I think I couldn’t download the data using my google account for kaggle authentication, so I changed to username / password

The kaggle api installation seems to work:

!  {sys.executable} -m pip install kaggle --upgrade

But the kaggle command seems not to be in the path:

! kaggle --version
/bin/sh: kaggle: command not found

Acording to https://github.com/Kaggle/kaggle-api “You can see where kaggle is installed by doing pip uninstall kaggle and seeing where the binary is”

!  {sys.executable} -m pip uninstall kaggle
Uninstalling kaggle-1.5.6:
  Would remove:
    /data/anaconda/envs/fastai/bin/kaggle
    /data/anaconda/envs/fastai/lib/python3.6/site-packages/kaggle-1.5.6.dist-info/*
    /data/anaconda/envs/fastai/lib/python3.6/site-packages/kaggle/*
Proceed (y/n)? ^C
ERROR: Operation cancelled by user

So kaggle is installed in /data/anaconda/envs/fastai/bin/

! /data/anaconda/envs/fastai/bin/kaggle --version
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /home/fizcogar/.kaggle/kaggle.json'
Kaggle API 1.5.6

According to the suggestion:

`! chmod 600 /home/fizcogar/.kaggle/kaggle.json`

The path for the data, as in the notebook:

path = Config.data_path()/'planet'
path.mkdir(parents=True, exist_ok=True)

And finally we can download the data, using the kaggle installation path:

! /data/anaconda/envs/fastai/bin/kaggle competitions download -c planet-understanding-the-amazon-from-space -f train-jpg.tar.7z -p {path}  
! /data/anaconda/envs/fastai/bin/kaggle competitions download -c planet-understanding-the-amazon-from-space -f train_v2.csv -p {path}  
! unzip -q -n {path}/train_v2.csv.zip -d {path}

To install 7zip, only this works for me:

! sudo apt install p7zip-full

And now you can extract the content of the file, as in the notebook:

! 7za -bd -y -so x {path}/train-jpg.tar.7z | tar xf - -C {path.as_posix()}

Good luck!

1 Like

Thanks for this. It got me unstuck.