Dog Breed Identification challenge

I fired up an AWS instance and then tried to download data set from kaggle using kaggle-cli but it I am unable to download the data into my instance. Those who are able to download data please help me.

Using same command I was able to download data in my local machine.

Thanks.

try pip install -U kaggle-cli to upgrade it.

1 Like

or try to indicate a filename

1 Like

@HariSumanth9
check out these instructions:

2 Likes

Why kaggle-cli??
when we can use wget using curl wget chrome extension…

1 Like

After I enter the jupyter notebook command in my terminal window( which was connected to aws instance via ssh), If I need another terminal window which is also connected to same aws instance IP how to get it ?
Thanks.

@HariSumanth9
Use tmux for additional windows. Instructions here:
tmux on aws

1 Like

@ecdrid
there are instructions here for both ways for downloading data;
there’s more than one way to download data, depending on preferences, setups, etc.
download data using Chrome wget
download data using Kaggle CLI

4 Likes

there are many people in this thread who are talking that with ensemble they have got better accuracy and lower loss.what i have understood in simple way that it nothing but grouping two model together and then using it to get better results.
so ,how can i learn this ensemble,and use it with fastai library further reduce loss from my model
any guidance towards it people

I downloaded data but I am getting strange error, please help me in resolving this error.

above is the ipynb notebook.
Thanks

Thanks

Probably you missed suffix parameter in from_csv method. Try to add suffix='jpg' if your images are jpgs.

1 Like

@sermakarevich I did missed it,
Thanks.
Its working now
Is this

normal . This is quite different from lesson1.ipynb training step

restart notebook, run all the stuff once again and it should be normal. Once you have an error during model training, you gonna have this “detailed” status bar :slight_smile:

1 Like

Hi,

In the dogs breed code below how is the file name (id) in csv is mapped to the image name in the folder i.e. train and test?

I mean from below code.

data = ImageClassifierData.from_csv(PATH, ‘train’, f’{PATH}labels.csv’, test_name=‘test’, num_workers=4,
** val_idxs= val_idx, suffix=’.jpg’, tfms=tfms, bs=bs)**

I did a dig in the py script looks the below code is getting this done but still not sure how file name in the csv is mapped to the image.

Signature: csv_source(folder, csv_file, skip_header=True, suffix=’’, continuous=False)
Source:
def csv_source(folder, csv_file, skip_header=True, suffix=’’, continuous=False):
fnames,csv_labels,all_labels,label2idx = parse_csv_labels(csv_file, skip_header)
full_names = [os.path.join(folder,fn+suffix) for fn in fnames]
if continuous:
label_arr = np.array([csv_labels[i] for i in fnames]).astype(np.float32)
else:
label_arr = nhot_labels(label2idx, csv_labels, fnames, len(all_labels))
is_single = np.all(label_arr.sum(axis=1)==1)
if is_single: label_arr = np.argmax(label_arr, axis=1)
return full_names, label_arr, all_labels
File: ~/courses/fastai2/courses/dl1/fastai/dataset.py
Type: function

Hey Ravindra,

labels.csv has two columns:

id,breed
000bec180eb18c7604dcecc8fe0dba07,boston_bull
001513dfcb2ffafc82cccf4d8bbaba97,dingo
001cdf01b096e06d78e9e5112d419397,pekinese
...

As you pointed out from_csv calls csv_source which gets the paths to the images by doing the following:

  1. Call parse_csv_labels to extract file names, e.g. 000bec180eb18c7604dcecc8fe0dba07, from the csv file. The file names are returned in an array called fnames.
  2. Join folder (e.g. train) with each item in fnames (e.g. 000bec180eb18c7604dcecc8fe0dba07) and suffix (e.g. .jpg). This gives us full_names, the array of the relative paths to the images.

After calling csv_source, from_csv does the following:

  1. Combine path (whatever you set PATH to before you passed it into the function) with the test_name (e.g. test) and get all the files in that path and store that in test_fnames (see how read_dir works)
  2. Pass path and the relative paths to our training images (the two pieces of information we need to be able to access the images), test_fnames, etc. to get_ds. This gives us our datasets for our training images and test images.

You can see how our Dataset for training can simply join the path with the fname to get the full path, e.g. https://github.com/fastai/fastai/blob/d9f9fab4b3fbeab8a207700160a782ae48eacc7a/fastai/dataset.py#L143

1 Like

How to train on full training data, passing val_ids = None is giving errors.
Also log_preds_tta,y = learn.TTA(is_test = True) is taking large time to run I eventually interrupted it after some time. what is wrong in it ?

You can pass val_ids = [0]. learn.TTA computing time depends on n_aug x images in test.

2 Likes

passing val_idxs = [0] is giving assertion errors

Try to remove tmp folder.