Dog Breed Identification challenge

I’m running the various fit commands (both on 224 and 299), and they’re taking quite a long time. Very slow.

learn.fit(1e-2, 5, cycle_len=1) takes 5-10+ minutes. I’ve been watching it run for the past 5 minutes, and it’s still on epoch 1. Is there any way to check that I’m properly utilizing my GPU? I’m using AWS.

I ran this command, it seems like some of the GPU is being used by Python, but not more than about 20%. Any tips?

(fastai) ubuntu@ip-172-31-37-237:~$ nvidia-smi
Fri Mar  2 00:54:57 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   81C    P0   134W / 149W |   2447MiB / 11439MiB |     95%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1541      C   ...tu/src/anaconda3/envs/fastai/bin/python  2434MiB |
+-----------------------------------------------------------------------------+

what batch size did you use?

1.Increasing the batch size will take up more memory.
2. Run ‘learn.unfreeze()’, you’ll notice a hike in the memory usage.

is the notebook that he generated for lesson 2 available? I do not see it in what was provided.

Notebook for dogbreeds is not availble. you are supposed to create it yourself :wink:

I’m getting the same thing. I couldn’t figure out how to fix it, did you?

Did you take np.mean() along axis=0 to get the final predictions?

Took some test array hacking things together to get this to work.

Here’s what I did that worked, but I’d like to improve upon it as it feels slow. I would guess there’s a way to manipulate the dataframe directly that’s much more efficient, instead of pulling things out, manipulating the list, then making a new dataframe.

%time test_probs = np.mean(np.exp(test_log_preds),0) # average across the 5 TTA's
test_probs.shape # (10357, 120)
len(data.test_dl.dataset.fnames) # 10357

# Remove file paths and jpg extension from ids
idnames = []
for filepath in data.test_dl.dataset.fnames:
    idnames.append( filepath.split('/')[-1].split('.jpg')[0])

headerrow = (['id'] + data.classes) # 1x121 list, header of the csv columns

# This part in particular feels wrong and is a bit slow to run. Any tips here to avoid converting nparrays to lists?, or to do it all in the context of a dataframe?
datarows = []
for idname, testproblist in zip(idnames, test_probs):
    datarows.append([idname] + list(testproblist))

df_submit_tta = pd.DataFrame.from_records(datarows, columns=headerrow)
df_submit_tta.head()
df_submit_tta.to_csv('test_df.csv', index=False)

Great stuff. After this lesson, I have two questions.

  1. Is there a function in the fastai where I can view the multiclass log loss per epoch, instead of accuracy? Since this is what the end score is based on, makes sense to monitor it the whole time. I’d like to compare directly to the Kaggle leaderboard, but I currently can’t as the metrics are different. Thought I’d check if something already exists before going down the DIY route.
  2. Jeremy mentioned doing away with the validation set and training on 100% of the training data then submitting, after the model is set up. What are the steps to doing this? And how does it train without knowing what its validation set is for doing sgd? Do you retrain and ensemble a handful of models on different training/validation groupings of the same training data? Or is there a simpler way?

Hello Friends,
I need some help . I’m coming across this error as shown in the attachment.


Really appreciate your help.
The problem I’m facing is during accuracy calculation . Its asking for a different data type than what I’ve given . I was following the lectures and over there it was done the same i.e passing numpy arrays but its expecting a torch data type . Please correct me in case if I’m doing wrong anywhere.

These two previous posts address the issue.

1 Like

hi @yinterian , the link appears to be broken now, can you please re-direct me to its updated location .
Thanks

When submitting my predictions using the notebook provided, I’m getting a score that’s close to 13 even though my accuracy during the training steps are in the low 90s. Is there an error somewhere that I should be looking out for?

Edit: Managed to fix my problem! In case anybody else is interested, I wrote to a .csv file instead of a .gz file.

What exactly is the metric used in Dog Breed Challenge? Why isn’t accuracy metric used there?