Kaggle Questions

This is a thread for posting problems and questions related to Kaggle.

I’m posting this in case it’s helpful for someone else: I had always signed into Kaggle using my linked Google account, so I got an error when I tried using the kaggle cli, which requires that you have a separate kaggle login. Fortunately, Kaggle has a solution: if you select Forgot Password?, you’ll receive an email with a few different options, the 3rd of which lets you set up your own Kaggle username/password and connects it to your original google (or other social media) account :slight_smile:


I am on Windows and I have had issues using kaggle cli. Jeremy suggested among other things to install ubuntu bash which I just did. Using ubuntu in windows is pretty alien to me … This is the error message I get when I am pip installing kaggle-cli.

vshets@VERA:/mnt/c/Users/Shetty/Anaconda$ pip install kaggle-cli
Downloading/unpacking kaggle-cli
  Downloading kaggle-cli-0.4.3.tar.gz
  Running setup.py (path:/tmp/pip_build_vshets/kaggle-cli/setup.py) egg_info for package kaggle-cli

Downloading/unpacking cliff (from kaggle-cli)
  Downloading cliff-2.2.0-py2-none-any.whl (44kB): 44kB downloaded
Downloading/unpacking MechanicalSoup (from kaggle-cli)
  Downloading MechanicalSoup-0.6.0-py2.py3-none-any.whl
Downloading/unpacking lxml (from kaggle-cli)
  Downloading lxml-3.6.4.tar.gz (3.7MB): 3.7MB downloaded
  Running setup.py (path:/tmp/pip_build_vshets/lxml/setup.py) egg_info for package lxml
    Building lxml version 3.6.4.
    Building without Cython.
    ERROR: /bin/sh: 1: xslt-config: not found

    ** make sure the development packages of libxml2 and libxslt are installed **

    Using build configuration of libxslt

    warning: no previously-included files found matching '*.py'
Downloading/unpacking cssselect (from kaggle-cli)
  Downloading cssselect-1.0.0-py2.py3-none-any.whl
Downloading/unpacking configparser (from kaggle-cli)
  Downloading configparser-3.5.0.tar.gz
  Running setup.py (path:/tmp/pip_build_vshets/configparser/setup.py) egg_info for package configparser

Downloading/unpacking pyparsing>=2.0.1 (from cliff->kaggle-cli)
  Downloading pyparsing-2.1.10-py2.py3-none-any.whl (56kB): 56kB downloaded
Downloading/unpacking six>=1.9.0 (from cliff->kaggle-cli)
  Downloading six-1.10.0-py2.py3-none-any.whl
Downloading/unpacking stevedore>=1.16.0 (from cliff->kaggle-cli)
  Downloading stevedore-1.18.0-py2.py3-none-any.whl
Downloading/unpacking cmd2>=0.6.7 (from cliff->kaggle-cli)
  Downloading cmd2-0.6.9.tar.gz (367kB): 367kB downloaded
  Running setup.py (path:/tmp/pip_build_vshets/cmd2/setup.py) egg_info for package cmd2

Downloading/unpacking pbr>=1.6 (from cliff->kaggle-cli)
  Downloading pbr-1.10.0-py2.py3-none-any.whl (96kB): 96kB downloaded
Downloading/unpacking unicodecsv>=0.8.0 (from cliff->kaggle-cli)
  Downloading unicodecsv-0.14.1.tar.gz
  Running setup.py (path:/tmp/pip_build_vshets/unicodecsv/setup.py) egg_info for package unicodecsv

Requirement already satisfied (use --upgrade to upgrade): PyYAML>=3.1.0 in /usr/lib/python2.7/dist-packages (from cliff->kaggle-cli)
Requirement already satisfied (use --upgrade to upgrade): PrettyTable>=0.7,<0.8 in /usr/lib/python2.7/dist-packages (from cliff->kaggle-cli)
Downloading/unpacking beautifulsoup4 (from MechanicalSoup->kaggle-cli)
  Downloading beautifulsoup4-4.5.1-py2-none-any.whl (83kB): 83kB downloaded
Requirement already satisfied (use --upgrade to upgrade): requests>=2.0 in /usr/lib/python2.7/dist-packages (from MechanicalSoup->kaggle-cli)
Installing collected packages: kaggle-cli, cliff, MechanicalSoup, lxml, cssselect, configparser, pyparsing, six, stevedore, cmd2, pbr, unicodecsv, beautifulsoup4
  Running setup.py install for kaggle-cli

    error: could not create '/usr/local/lib/python2.7/dist-packages/kaggle_cli': Permission denied
    Complete output from command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_vshets/kaggle-cli/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-pKmcBh-record/install-record.txt --single-version-externally-managed --compile:
    running install

running build

running build_py

creating build

creating build/lib.linux-x86_64-2.7

creating build/lib.linux-x86_64-2.7/kaggle_cli

copying kaggle_cli/__init__.py -> build/lib.linux-x86_64-2.7/kaggle_cli

copying kaggle_cli/common.py -> build/lib.linux-x86_64-2.7/kaggle_cli

copying kaggle_cli/config.py -> build/lib.linux-x86_64-2.7/kaggle_cli

copying kaggle_cli/download.py -> build/lib.linux-x86_64-2.7/kaggle_cli

copying kaggle_cli/main.py -> build/lib.linux-x86_64-2.7/kaggle_cli

copying kaggle_cli/submit.py -> build/lib.linux-x86_64-2.7/kaggle_cli

running egg_info

writing requirements to kaggle_cli.egg-info/requires.txt

writing kaggle_cli.egg-info/PKG-INFO

writing namespace_packages to kaggle_cli.egg-info/namespace_packages.txt

writing top-level names to kaggle_cli.egg-info/top_level.txt

writing dependency_links to kaggle_cli.egg-info/dependency_links.txt

writing entry points to kaggle_cli.egg-info/entry_points.txt

warning: manifest_maker: standard file '-c' not found

reading manifest file 'kaggle_cli.egg-info/SOURCES.txt'

writing manifest file 'kaggle_cli.egg-info/SOURCES.txt'

running install_lib

creating /usr/local/lib/python2.7/dist-packages/kaggle_cli

error: could not create '/usr/local/lib/python2.7/dist-packages/kaggle_cli': Permission denied

Cleaning up...
Command /usr/bin/python -c "import setuptools, tokenize;__file__='/tmp/pip_build_vshets/kaggle-cli/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-pKmcBh-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip_build_vshets/kaggle-cli
Storing debug log for failure in /home/vshets/.pip/pip.log

Also when I pip install anything from ubuntu bash, I am assuming it is not installing anything to the Windows directories?

Looking at cygwin, any particular version of gcc I should be installing? Currently I have libgcc1 installed.

Managed to get rid of the above by:
sudo apt-get install python-lxml
sudo pip install kaggle-cli

When I type kg download command, I get this:

Traceback (most recent call last):
  File "/usr/local/bin/kg", line 5, in <module>
    from pkg_resources import load_entry_point
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2749, in <module>
    working_set = WorkingSet._build_master()
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 444, in _build_master
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 725, in require    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 628, in resolve    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: unicodecsv>=0.8.0

Do I need to install anaconda like distribution to avoid error messsages like these? If so can it be done in ubuntu bash?

Edit: Now resolved by installing Anaconda distribution in Ubuntu bash for windows. I was hoping at least the mini conda installation would have sufficed but alas ;(

A note on downloading data using kaggle-cli: after you install kaggle-cli, make sure to configure the competition first before you download the data:
kg config -g -u USERNAME -p PASSWORD -c dogs-vs-cats-redux-kernels-edition

kg download -u USERNAME -p PASSWORD -c dogs-vs-cats-redux-kernels-edition

otherwise it will download incomplete .zip files.


@vshets I think using anaconda on ubuntu/windows would be a good idea.

Either way, this error can be fixed with ‘sudo pip install unicodecsv’.

Minor correction: you can set the config values with:
kg config -g -u USERNAME -p PASSWORD -c dogs-vs-cats-redux-kernels-edition
If you do that, it’s sufficient to run:
kg download
(without re-entering the variables)

Otherwise, if you haven’t set your config values, then you would use:
kg download -u USERNAME -p PASSWORD -c dogs-vs-cats-redux-kernels-edition


That is good to know!

Ahh … I went over the top and installed the entire Anaconda dist just to run the kaggle-cli without issues.

I am getting an error unzipping the file when I use the commands above to set my configuration and download the files from kaggle. The error occurred when I tested on aws and on my local machine.


is anyone have problems with a kaggle error “list out of range”
I set up a kaggle account manually and activated the account via the email link.
I installed the kaggle cli on the AWS M
i followed the config instructions and it seemed fine. When I tried to do kg download (with and without the config variables). I’m getting a list out of range error after it attempts a https connection with kaggle.

Figured it out. You have to manually go to the competition page and click on a dataset to accept an agreement before being able to use the kaggle-cli to dowload the dataset.


great! thanks so much!

For this looking to recreate Jeremy’s data directory structure as in data/dogscats, I have written a script that replicates this for the redux edition. It is available here. The formatting that describes the organization got lost after uploading. Sorry.
The easiest way to implement this is download the zip files into data/cats-dogs-redux folder and the above raw script in the nbs folder.
Once you are in the root of the aws ssh, change directory to nbs and run this command python create_folders.py


got the zip files to download thanks!

so you download the dataset and then download the python script, and then run the create_folders.py script?

yep … that is correct.

I’m getting an error on the aws instance trying to get the file with wget

Link has been updated to now go the raw file directly.

Hi vshets, I can see how the script would be useful, but I can’t get script to run

and I think I have the initial file structure right. use wget to put your script in the nbs folder and then run it there with python. I’m not sure why the script is failing to see the train.zip file when the path looks correct and the file is there.

I see what happened. The data folder should be outside of nbs - like this

Or in your create_folders.py this line has to be edited which might work for you based on your current structure:

def main():
    parent_path = 'data/cats-dogs-redux' # based on your structure

you can change it to where ever the train.zip and test.zip files are sitting.