Dog Breed Identification challenge

(Nandamuri Hari Naga Sumanth) #347

I fired up an AWS instance and then tried to download data set from kaggle using kaggle-cli but it I am unable to download the data into my instance. Those who are able to download data please help me.

Using same command I was able to download data in my local machine.


(Vitaly Bushaev) #348

try pip install -U kaggle-cli to upgrade it.

(sergii makarevych) #349

or try to indicate a filename

(Reshama Shaikh) #350

check out these instructions:

(Aditya) #351

Why kaggle-cli??
when we can use wget using curl wget chrome extension…

(Nandamuri Hari Naga Sumanth) #352

After I enter the jupyter notebook command in my terminal window( which was connected to aws instance via ssh), If I need another terminal window which is also connected to same aws instance IP how to get it ?

(Reshama Shaikh) #353

Use tmux for additional windows. Instructions here:
tmux on aws

(Reshama Shaikh) #354

there are instructions here for both ways for downloading data;
there’s more than one way to download data, depending on preferences, setups, etc.
download data using Chrome wget
download data using Kaggle CLI

(naveen manwani) #355

there are many people in this thread who are talking that with ensemble they have got better accuracy and lower loss.what i have understood in simple way that it nothing but grouping two model together and then using it to get better results.
so ,how can i learn this ensemble,and use it with fastai library further reduce loss from my model
any guidance towards it people

(Nandamuri Hari Naga Sumanth) #356

I downloaded data but I am getting strange error, please help me in resolving this error.

above is the ipynb notebook.

(Nandamuri Hari Naga Sumanth) #357


(sergii makarevych) #358

Probably you missed suffix parameter in from_csv method. Try to add suffix='jpg' if your images are jpgs.

(Nandamuri Hari Naga Sumanth) #359

@sermakarevich I did missed it,
Its working now
Is this

normal . This is quite different from lesson1.ipynb training step

(sergii makarevych) #360

restart notebook, run all the stuff once again and it should be normal. Once you have an error during model training, you gonna have this “detailed” status bar :slight_smile:

(Ravindra Mahar) #361


In the dogs breed code below how is the file name (id) in csv is mapped to the image name in the folder i.e. train and test?

I mean from below code.

data = ImageClassifierData.from_csv(PATH, ‘train’, f’{PATH}labels.csv’, test_name=‘test’, num_workers=4,
** val_idxs= val_idx, suffix=’.jpg’, tfms=tfms, bs=bs)**

I did a dig in the py script looks the below code is getting this done but still not sure how file name in the csv is mapped to the image.

Signature: csv_source(folder, csv_file, skip_header=True, suffix=’’, continuous=False)
def csv_source(folder, csv_file, skip_header=True, suffix=’’, continuous=False):
fnames,csv_labels,all_labels,label2idx = parse_csv_labels(csv_file, skip_header)
full_names = [os.path.join(folder,fn+suffix) for fn in fnames]
if continuous:
label_arr = np.array([csv_labels[i] for i in fnames]).astype(np.float32)
label_arr = nhot_labels(label2idx, csv_labels, fnames, len(all_labels))
is_single = np.all(label_arr.sum(axis=1)==1)
if is_single: label_arr = np.argmax(label_arr, axis=1)
return full_names, label_arr, all_labels
File: ~/courses/fastai2/courses/dl1/fastai/
Type: function

(Allie Crevier) #362

Hey Ravindra,

labels.csv has two columns:


As you pointed out from_csv calls csv_source which gets the paths to the images by doing the following:

  1. Call parse_csv_labels to extract file names, e.g. 000bec180eb18c7604dcecc8fe0dba07, from the csv file. The file names are returned in an array called fnames.
  2. Join folder (e.g. train) with each item in fnames (e.g. 000bec180eb18c7604dcecc8fe0dba07) and suffix (e.g. .jpg). This gives us full_names, the array of the relative paths to the images.

After calling csv_source, from_csv does the following:

  1. Combine path (whatever you set PATH to before you passed it into the function) with the test_name (e.g. test) and get all the files in that path and store that in test_fnames (see how read_dir works)
  2. Pass path and the relative paths to our training images (the two pieces of information we need to be able to access the images), test_fnames, etc. to get_ds. This gives us our datasets for our training images and test images.

You can see how our Dataset for training can simply join the path with the fname to get the full path, e.g.

(Nandamuri Hari Naga Sumanth) #363

How to train on full training data, passing val_ids = None is giving errors.
Also log_preds_tta,y = learn.TTA(is_test = True) is taking large time to run I eventually interrupted it after some time. what is wrong in it ?

(sergii makarevych) #364

You can pass val_ids = [0]. learn.TTA computing time depends on n_aug x images in test.

(Nandamuri Hari Naga Sumanth) #365

passing val_idxs = [0] is giving assertion errors

(sergii makarevych) #366

Try to remove tmp folder.