Dog Breed Identification challenge

HariSumanth9 · December 5, 2017, 1:27pm

I fired up an AWS instance and then tried to download data set from kaggle using kaggle-cli but it I am unable to download the data into my instance. Those who are able to download data please help me.

Using same command I was able to download data in my local machine.

Thanks.

bushaev · December 5, 2017, 1:44pm

try pip install -U kaggle-cli to upgrade it.

sermakarevich · December 5, 2017, 1:48pm

or try to indicate a filename

reshama · December 5, 2017, 2:48pm

@HariSumanth9
check out these instructions:

github.com

reshamas/fastai_deeplearn_part1/blob/master/tools/download_data_kaggle_cli.md

# Kaggle CLI
(**CLI** = **C**ommand **L**ine **I**nterface)  

## Resource
[Kaggle CLI Wiki](http://wiki.fast.ai/index.php/Kaggle_CLI)

## Installation
Check to see if `kaggle-cli` is installed:  
<kbd> kaggle-cli --version </kbd>  

Install `kaggle-cli`:  
<kbd> pip install kaggle-cli </kbd>  
or <kbd> pip3 install kaggle-cli </kbd> 

May need to **update package** if you run into errors:  
<kbd> pip install kaggle-cli --upgrade </kbd>  
or <kbd> pip3 install kaggle-cli --upgrade </kbd>  


---

This file has been truncated. show original

ecdrid · December 5, 2017, 3:11pm

Why kaggle-cli??
when we can use wget using curl wget chrome extension…

HariSumanth9 · December 5, 2017, 3:36pm

After I enter the jupyter notebook command in my terminal window( which was connected to aws instance via ssh), If I need another terminal window which is also connected to same aws instance IP how to get it ?
Thanks.

reshama · December 5, 2017, 3:50pm

@HariSumanth9
Use tmux for additional windows. Instructions here:
tmux on aws

reshama · December 5, 2017, 5:14pm

@ecdrid
there are instructions here for both ways for downloading data;
there’s more than one way to download data, depending on preferences, setups, etc.
download data using Chrome wget
download data using Kaggle CLI

naveenmanwani · December 5, 2017, 8:36pm

there are many people in this thread who are talking that with ensemble they have got better accuracy and lower loss.what i have understood in simple way that it nothing but grouping two model together and then using it to get better results.
so ,how can i learn this ensemble,and use it with fastai library further reduce loss from my model
any guidance towards it people

HariSumanth9 · December 6, 2017, 6:16am

I downloaded data but I am getting strange error, please help me in resolving this error.

above is the ipynb notebook.
Thanks

HariSumanth9 · December 6, 2017, 6:21am

Thanks

sermakarevich · December 6, 2017, 6:21am

Probably you missed suffix parameter in from_csv method. Try to add suffix='jpg' if your images are jpgs.

HariSumanth9 · December 6, 2017, 6:29am

@sermakarevich I did missed it,
Thanks.
Its working now
Is this

normal . This is quite different from lesson1.ipynb training step

sermakarevich · December 6, 2017, 6:33am

restart notebook, run all the stuff once again and it should be normal. Once you have an error during model training, you gonna have this “detailed” status bar

ravimahar · December 8, 2017, 3:57am

Hi,

In the dogs breed code below how is the file name (id) in csv is mapped to the image name in the folder i.e. train and test?

I mean from below code.

data = ImageClassifierData.from_csv(PATH, ‘train’, f’{PATH}labels.csv’, test_name=‘test’, num_workers=4,
** val_idxs= val_idx, suffix=’.jpg’, tfms=tfms, bs=bs)**

I did a dig in the py script looks the below code is getting this done but still not sure how file name in the csv is mapped to the image.

Signature: csv_source(folder, csv_file, skip_header=True, suffix=’’, continuous=False)
Source:
def csv_source(folder, csv_file, skip_header=True, suffix=’’, continuous=False):
fnames,csv_labels,all_labels,label2idx = parse_csv_labels(csv_file, skip_header)
full_names = [os.path.join(folder,fn+suffix) for fn in fnames]
if continuous:
label_arr = np.array([csv_labels[i] for i in fnames]).astype(np.float32)
else:
label_arr = nhot_labels(label2idx, csv_labels, fnames, len(all_labels))
is_single = np.all(label_arr.sum(axis=1)==1)
if is_single: label_arr = np.argmax(label_arr, axis=1)
return full_names, label_arr, all_labels
File: ~/courses/fastai2/courses/dl1/fastai/dataset.py
Type: function

creviera · December 9, 2017, 12:53am

Hey Ravindra,

labels.csv has two columns:

id,breed
000bec180eb18c7604dcecc8fe0dba07,boston_bull
001513dfcb2ffafc82cccf4d8bbaba97,dingo
001cdf01b096e06d78e9e5112d419397,pekinese
...

As you pointed out from_csv calls csv_source which gets the paths to the images by doing the following:

Call parse_csv_labels to extract file names, e.g. 000bec180eb18c7604dcecc8fe0dba07, from the csv file. The file names are returned in an array called fnames.
Join folder (e.g. train) with each item in fnames (e.g. 000bec180eb18c7604dcecc8fe0dba07) and suffix (e.g. .jpg). This gives us full_names, the array of the relative paths to the images.

After calling csv_source, from_csv does the following:

Combine path (whatever you set PATH to before you passed it into the function) with the test_name (e.g. test) and get all the files in that path and store that in test_fnames (see how read_dir works)
Pass path and the relative paths to our training images (the two pieces of information we need to be able to access the images), test_fnames, etc. to get_ds. This gives us our datasets for our training images and test images.

You can see how our Dataset for training can simply join the path with the fname to get the full path, e.g. https://github.com/fastai/fastai/blob/d9f9fab4b3fbeab8a207700160a782ae48eacc7a/fastai/dataset.py#L143

HariSumanth9 · December 14, 2017, 5:04pm

How to train on full training data, passing val_ids = None is giving errors.
Also log_preds_tta,y = learn.TTA(is_test = True) is taking large time to run I eventually interrupted it after some time. what is wrong in it ?

sermakarevich · December 14, 2017, 6:49pm

You can pass val_ids = [0]. learn.TTA computing time depends on n_aug x images in test.

HariSumanth9 · December 15, 2017, 7:24am

passing val_idxs = [0] is giving assertion errors

sermakarevich · December 15, 2017, 7:35am

Try to remove tmp folder.