Dog Breed Identification challenge

(Daniel Hunter) #387

I’m running the various fit commands (both on 224 and 299), and they’re taking quite a long time. Very slow., 5, cycle_len=1) takes 5-10+ minutes. I’ve been watching it run for the past 5 minutes, and it’s still on epoch 1. Is there any way to check that I’m properly utilizing my GPU? I’m using AWS.

I ran this command, it seems like some of the GPU is being used by Python, but not more than about 20%. Any tips?

(fastai) ubuntu@ip-172-31-37-237:~$ nvidia-smi
Fri Mar  2 00:54:57 2018
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   81C    P0   134W / 149W |   2447MiB / 11439MiB |     95%      Default |

| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0      1541      C   ...tu/src/anaconda3/envs/fastai/bin/python  2434MiB |

(Sharwon Pius) #388

what batch size did you use?

1.Increasing the batch size will take up more memory.
2. Run ‘learn.unfreeze()’, you’ll notice a hike in the memory usage.

(Kevin) #389

is the notebook that he generated for lesson 2 available? I do not see it in what was provided.

(pradla) #391

Notebook for dogbreeds is not availble. you are supposed to create it yourself :wink:


I’m getting the same thing. I couldn’t figure out how to fix it, did you?

(RT) #393

Did you take np.mean() along axis=0 to get the final predictions?

(Maxwell McKinnon) #395

Took some test array hacking things together to get this to work.

Here’s what I did that worked, but I’d like to improve upon it as it feels slow. I would guess there’s a way to manipulate the dataframe directly that’s much more efficient, instead of pulling things out, manipulating the list, then making a new dataframe.

%time test_probs = np.mean(np.exp(test_log_preds),0) # average across the 5 TTA's
test_probs.shape # (10357, 120)
len(data.test_dl.dataset.fnames) # 10357

# Remove file paths and jpg extension from ids
idnames = []
for filepath in data.test_dl.dataset.fnames:
    idnames.append( filepath.split('/')[-1].split('.jpg')[0])

headerrow = (['id'] + data.classes) # 1x121 list, header of the csv columns

# This part in particular feels wrong and is a bit slow to run. Any tips here to avoid converting nparrays to lists?, or to do it all in the context of a dataframe?
datarows = []
for idname, testproblist in zip(idnames, test_probs):
    datarows.append([idname] + list(testproblist))

df_submit_tta = pd.DataFrame.from_records(datarows, columns=headerrow)
df_submit_tta.to_csv('test_df.csv', index=False)

(Maxwell McKinnon) #396

Great stuff. After this lesson, I have two questions.

  1. Is there a function in the fastai where I can view the multiclass log loss per epoch, instead of accuracy? Since this is what the end score is based on, makes sense to monitor it the whole time. I’d like to compare directly to the Kaggle leaderboard, but I currently can’t as the metrics are different. Thought I’d check if something already exists before going down the DIY route.
  2. Jeremy mentioned doing away with the validation set and training on 100% of the training data then submitting, after the model is set up. What are the steps to doing this? And how does it train without knowing what its validation set is for doing sgd? Do you retrain and ensemble a handful of models on different training/validation groupings of the same training data? Or is there a simpler way?

(ashis) #397

Hello Friends,
I need some help . I’m coming across this error as shown in the attachment.

Really appreciate your help.
The problem I’m facing is during accuracy calculation . Its asking for a different data type than what I’ve given . I was following the lectures and over there it was done the same i.e passing numpy arrays but its expecting a torch data type . Please correct me in case if I’m doing wrong anywhere.

(Phil Weslow) #398

These two previous posts address the issue.