Lesson 4 In-Class Discussion

(Sanyam Bhutani) #351

I tried setting the locale, rebooting. Still no luck

This is what locale command returns

locale: Cannot set LC_ALL to default locale: No such file or directory

I think I’ll have to wait for @guthl 's tutorial :slight_smile:

(Ramesh Sampath) #352

Can you also upload the Notebook that you see the error? You only have the pre-processing notebook in the gist, but that’s not where you have the error.

(Sanyam Bhutani) #353

Apologies! :sweat_smile:
I just uploaded the other file as well.

(Sabelo Mhlambi) #354


In this screenshot for lesson 4 – in Layer (1) It takes 1024 activations and halves them to 512 – is this MaxPool being applied as the last “hidden layer” of layer 1?

Layer 0 (BatchNorm1d) is this the layer that takes the pretrained model’s final output (a vector)
and outputs a layer of 1024 activations?

When taking the Dropout with a percentage of 0.5 are we essentially halving the
(therefore halving the layer, similar to maxpool) or are we ignoring
the half of the activations (still keeping the layer at its same size) or neither??

(Ramesh Sampath) #355

There’s no MaxPool here. It’s just that Layer 1 takes input nodes(features) of 1024 and has 512 Nodes as output. You typically may not have MaxPooling unless it’s Image Features where it’s OK to take max of surrounding pixels to compress the H x W.

Without knowing which notebook this is using, it’s hard to tell if it’s using pre-trained network outout, but the Batch Norm layer doesn’t change the dimensions. It only re-centers the data to mean 0 although that can be changed by backprop process.

It’s the latter. We just set the Activations to Zero, but the network architecture and number of input / outputs do not change.

(Devan Govender) #356

What is the default activation function for the fully connected layers in the ColumnarModelData.from_data_frame model?

The model summary output indicates these are just linear layers. Is this correct?

(Sanyam Bhutani) #357

For now, I found a workaround by manually converting the .txt file into an ASCII encoded .txt file.

(Louis Guthmann) #358

I wanted to check different configurations before submitting but as we say in French: “better is the enemy of good” :slight_smile:

In my setting, this works:

apt-get -qq update && apt-get -qqy install locales
sed -i -e ‘s/# ru_RU.UTF-8 UTF-8/ru_RU.UTF-8 UTF-8/’ /etc/locale.gen &&
sed -i -e ‘s/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/’ /etc/locale.gen &&
locale-gen &&
update-locale LANG=ru_RU.UTF-8 &&
echo “LANGUAGE=ru_RU.UTF-8” >> /etc/default/locale &&
echo “LC_ALL=ru_RU.UTF-8” >> /etc/default/locale

(Ibrahim El-Fayoumi) #359

lesson4 IMDB
Hello, in the leaner.fit
learner.fit(3e-3, 1, wds=1e-6, cycle_len=20, cycle_save_name=‘adam3_20’)
I am getting the followin error:
A Jupyter Widget
0%| | 0/4603 [00:00<?, ?it/s]

AttributeError Traceback (most recent call last)
in ()
----> 1 learner.fit(3e-3, 1, wds=1e-6, cycle_len=20, cycle_save_name=‘adam3_20’)

~/workspace/fastai/courses/dl1/fastai/learner.py in fit(self, lrs, n_cycle, wds, **kwargs)
190 self.sched = None
191 layer_opt = self.get_layer_opt(lrs, wds)
–> 192 self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs)
194 def lr_find(self, start_lr=1e-5, end_lr=10, wds=None):

~/workspace/fastai/courses/dl1/fastai/learner.py in fit_gen(self, model, data, layer_opt, n_cycle, cycle_len, cycle_mult, cycle_save_name, metrics, callbacks, use_wd_sched, **kwargs)
137 n_epoch = sum_geom(cycle_len if cycle_len else 1, cycle_mult, n_cycle)
138 fit(model, data, n_epoch, layer_opt.opt, self.crit,
–> 139 metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, **kwargs)
141 def get_layer_groups(self): return self.models.get_layer_groups()

~/workspace/fastai/courses/dl1/fastai/model.py in fit(model, data, epochs, opt, crit, metrics, callbacks, **kwargs)
82 for (*x,y) in t:
83 batch_num += 1
—> 84 for cb in callbacks: cb.on_batch_begin()
85 loss = stepper.step(V(x),V(y))
86 avg_loss = avg_loss * avg_mom + loss * (1-avg_mom)

AttributeError: ‘CosAnneal’ object has no attribute ‘on_batch_begin’

(Jeremy Howard) #360

That’s odd. Can you git pull, restart jupyter, and try again?

(Vikrant Behal) #361

Since lecture 4 I’ve struggled with a never-ending training of IMDB notebook. Fortunately, I got some pre-trained weights (thanks to @Moody and @wgpubs) thus I thought of creating this post to allow fellow students to explore the notebook since many of us have skipped it because of training time.

You can access the post at: Running IMDB notebook under 10 minutes

(Anand Saha) #362

@Elfayoumi on_batch_begin() is part of new code. You might have done a git pull while your notebook was still loaded in memory. The notebook is executing well at my end. As Jeremy mentioned, git pull and restart notebook.

(Vikrant Behal) #363

I’ve updated the IMDB file.

Updated file has information about object types, model structure, calculation/logic (for those with less knowledge about Pytorch and/or numpy), split etc.


How data is split in test, valid and test set for IMDB?

torch by default gives 2 splits (of 25k items each) for IMDB dataset. My assumption is that one is train and other is test set? If so, can I say validation set is part of the train itself? If so, what’s the ratio?

Test set - 25k used here?

We aren’t explicitely specifying validation items anywhere.

(Vikrant Behal) #364

Looks like there is no validation in this split:
@yinterian - Would you be able to share insight on why Pytorch doesn’t include content for validation?
Source - https://github.com/pytorch/text/blob/master/torchtext/datasets/imdb.py

(yinterian) #365

You can use part of your training for validation or use cross-validation.

(Vikrant Behal) #366

Without validation, our model gave results better than state of the art per lecture 4. It’ll be interesting to try and see if validation improves the performance.

(yinterian) #367

The way you would do this is by using part of the train set as validation. You can potentially at the end join train and validation for your final model. Validation sets are used to find the best hyper-parameters without looking at test score. In this case we are using the test set as a validation set.

(Vikrant Behal) #368

Wouldn’t that imply our model has already seen the test records? If so, our predictions on same test data may not represent (accuracy/score/etc.) what we’ll get if we test on unseen data?

I may be inferring wrong code but below split says there is no validation.

(Aditya) #369

We can pass in the Validation as a parameter?

(Vikrant Behal) #370

Yup! @ecdrid, how do you infer Yinterian’s last reply on this thread? Also, the default splits seem not be using validation at all.

I guess the question has evolved from ‘how to add’ to ‘why not’?