Lesson 7 further discussion ✅

(WG) #8

Datablock API questinon …

Is there a recommended approach to using a merging different ItemLists into a single ItemList?

For example, I have a raw dataset that includes both tabular and text data from which I want to create a single ItemList that uses both the tabular processing goodness for handling categorical and continuous variables as well as the text classification goodness that handles tokenization, numericalization, vocab building, etc… for the text bits … and then for shuffling/sorting tell it to use the strategy implemented in the TextClasDataBunch.create().


You should probably define your custom ItemList for that (see the corresponding tutorial).

(Ignacio Lopez-Francos) #10

Sorry if this is not the right forum for this question, but I’m somehow getting weird predictions with the GAN Learner when running the lesson7-superres-gan.ipynb notebook. I haven’t changed anything from the original code.

Before getting into the building the GAN Learner section, everything seems to work as expected. But when training the GAN Learner these are the results:

data_crit = get_crit_data(['crappy', 'images'], bs=bs, size=size)
learn_crit = create_critic_learner(data_crit, metrics=None).load('critic-pre2')
learn_gen = create_gen_learner().load('gen-pre2')

switcher = partial(AdaptiveGANSwitcher, critic_thresh=0.65)
learn = GANLearner.from_learners(learn_gen, learn_crit, weights_gen=(1.,50.), show_img=True, switcher=switcher,
                                 opt_func=partial(optim.Adam, betas=(0.,0.99)), wd=wd)
learn.callback_fns.append(partial(GANDiscriminativeLR, mult_lr=5.))

lr = 1e-4

While train loss is declining, both generator and discriminator losses have not changed for a few epochs. On the 15th epoch it drops for both gen and disc. Then again on the 19th. Coincidentally, the predicted image printed is a different animal (see below the results)
Any thought on why is this happening?

(Ilia) #11

That would be great! This information would be really helpful for anyone who tries to train on the datasets that cannot fit into memory or require distributed computations. It seems to be a not too straightforward topic and rarely covered in tutorials.

(Scott H Hawley) #12

Hi @jeremy, I noticed that in the lessons, you’re using ReLU + Batch Normalization. It’s my understanding that the authors of ELU and SELU (https://arxiv.org/abs/1706.02515 ) have shown that these activations accomplish the work of Batch Norm, but without its slow performance. My own experiments with image classification have agreed with this, i.e. that (in my case) one can replace a lot of the ReLU+BN blocks in a model with ELUs and get comparable results only in much less (wall clock) time.

My question: Given that you often try to teach “the latest”/“cutting edge” methods: Why are you using ReLU+BN? Is it that you haven’t seen similar results, or is it that since the models you were teaching on use this construction, you were teaching that, or… some other reason?


GREAT course btw. Excellent instruction. I’m very thankful to have been given the opportunity to participate.

(Michael) #13

Here is a thread on Self-Normalizing Neural Networks but unfortunately the results seem not so applicable. Did you played around with it and got interesting results?

(Scott H Hawley) #14

Yea, similar to that thread, I did not find swapping ReLU+BN for SeLU to work for me – perhaps there are other aspects of SNN that I failed to implement.

But swapping ReLU+BN --> ELU… Actually, I should revisit those tests and report back. It’s been a couple years since I made that switch.

I can definitely assert that doing BN before ELU makes no significant change in accuracy compared to ELU alone (without BN at all), and furthermore that using ELU alone does work really well and works really fast… but these are not remotely the same thing as my earlier claim that “ELU works as well as doing BN after ReLU”!

(And anyway, as discussed in this Reddit thread, one should do BN after activation, not before).

I’ll re-do some tests and edit this post later.

EDIT: These experiments were done using Keras, not Fast.AI or PyTorch. I suspect that Francois Chollet or someone else has tweaked the Keras BN in the past year or so, because I remember it used to be “slow”. Anyway, my experiments today do confirm that ELU alone works “just as well” (in multiple senses: in the sense of accuracy & loss on the validation set, in the same number of epochs, with essentially indistinguishable ROC curves) as ReLU+BN, but also that the two approaches (now) take essentially the same execution time. (The ELU-only version is maybe 0.5% faster if that’s significant.)

As an additional bit of trivia, at least for my code (https://github.com/drscotthawley/panotti/blob/master/panotti/models.py), it’s better to keep the first block to be BN before ReLU, regardless of what one does with the other layers.

So, I retract my earlier claim: ReLU+BN has not been shown to have been superseded.

One more update: Actually, in some newer audio autoencoder experiments I’m doing using PyTorch, ELU seems to be faster and nearly as accurate (lower loss in Training & Validation sets) as ReLU+BN, but the latter has slightly lower loss.

(Jeremy Howard (Admin)) #15

SELU is super-fiddly. Everything has to be “just so” to ensure it’s prerequisites are met. I tried to do that for a while when it came out, but in the end the complexity just wasn’t worth it, for me.

However, recent advances in understanding of normalization are leading to other directions that may have the benefits of batchnorm without the downsides. We’ll look at them in some detail in part 2.

(Nick Switanek) #16

In the lecture and resnet-mnist notebook, Jeremy’s basic CNN example uses Conv2D, BatchNorm, ReLU sequences. Then the refactor uses fastai’s conv_layer, which wraps a Conv2D, ReLU, BatchNorm sequence.

Is the order of the ReLU and BatchNorm operations not consequential?

I would have thought the composition of the sequence matters, especially the location of the ReLU. Further, I’d have thought that BatchNorm on the raw Conv2D weights rather than on the ReLU-truncated weights would make more sense, in accordance with the basic CNN example, rather than with the conv_layer function.

(Jeremy Howard (Admin)) #17

It is somewhat consequential - experiments show that BatchNorm after ReLU is generally a bit better.

(Kofi Asiedu Brempong) #18

When implementing the unet paper, I noticed that in the case of the example in the paper, the features from the encoder are 64x64 and the upsampled features are 56x56.
My question is how do concatenate them, do you pad the upsampled features to 64x64 or do you crop the features from the encoder to 56x56.

(Francisco Ingham) #19

You crop the features from the encoder. Quoting the paper:

Every step in the expansive path consists of an upsampling of the
feature map followed by a 2x2 convolution (“up-convolution”) that halves the
number of feature channels, a concatenation with the correspondingly cropped
feature map from the contracting path, and two 3x3 convolutions, each followed by a ReLU. The cropping is necessary due to the loss of border pixels in
every convolution.

The ‘contracting path’ is the encoder which ‘contracts’/‘encodes’ the input into a representation of smaller dimensions.

(Kofi Asiedu Brempong) #20


(Pushkar) #21

Hi All,

I am getting an error while doing learn.fit(40,lr) in lesson7-superres-gan notebook.

NameError Traceback (most recent call last)
----> 1 learn.fit(40,lr)

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/basic_train.py in fit(self, epochs, lr, wd, callbacks)
176 callbacks = [cb(self) for cb in self.callback_fns] + listify(callbacks)
177 fit(epochs, self.model, self.loss_func, opt=self.opt, data=self.data, metrics=self.metrics,
–> 178 callbacks=self.callbacks+callbacks)
180 def create_opt(self, lr:Floats, wd:Floats=0.)->None:

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/utils/mem.py in wrapper(*args, **kwargs)
102 try:
–> 103 return func(*args, **kwargs)
104 except Exception as e:
105 if (“CUDA out of memory” in str(e) or

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/basic_train.py in fit(epochs, model, loss_func, opt, data, callbacks, metrics)
78 cb_handler = CallbackHandler(callbacks, metrics)
79 pbar = master_bar(range(epochs))
—> 80 cb_handler.on_train_begin(epochs, pbar=pbar, metrics=metrics)
82 exception=False

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/callback.py in on_train_begin(self, epochs, pbar, metrics)
213 self.state_dict[‘n_epochs’],self.state_dict[‘pbar’],self.state_dict[‘metrics’] = epochs,pbar,metrics
214 names = [(met.name if hasattr(met, ‘name’) else camel2snake(met.class.name)) for met in self.metrics]
–> 215 self(‘train_begin’, metrics_names=names)
217 def on_epoch_begin(self)->None:

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/callback.py in call(self, cb_name, call_mets, **kwargs)
199 “Call through to all of the CallbakHandler functions.”
200 if call_mets: [getattr(met, f’on_{cb_name}’)(**self.state_dict, **kwargs) for met in self.metrics]
–> 201 return [getattr(cb, f’on_{cb_name}’)(**self.state_dict, **kwargs) for cb in self.callbacks]
203 def set_dl(self, dl:DataLoader):

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/callback.py in (.0)
199 “Call through to all of the CallbakHandler functions.”
200 if call_mets: [getattr(met, f’on_{cb_name}’)(**self.state_dict, **kwargs) for met in self.metrics]
–> 201 return [getattr(cb, f’on_{cb_name}’)(**self.state_dict, **kwargs) for cb in self.callbacks]
203 def set_dl(self, dl:DataLoader):

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/vision/gan.py in on_train_begin(self, **kwargs)
91 “Create the optimizers for the generator and critic if necessary, initialize smootheners.”
92 if not getattr(self,‘opt_gen’,None):
—> 93 self.opt_gen = self.opt.new([nn.Sequential(*flatten_model(self.generator))])
94 else: self.opt_gen.lr,self.opt_gen.wd = self.opt.lr,self.opt.wd
95 if not getattr(self,‘opt_critic’,None):

~/anaconda3/envs/fastai/lib/python3.7/site-packages/fastai/callback.py in new(self, layer_groups)
28 “Create a new OptimWrapper from self with another layer_groups but the same hyper-parameters.”
29 opt_func = getattr(self, ‘opt_func’, self.opt.class)
—> 30 split_groups = split_bn_bias(layer_groups)
31 opt = opt_func([{‘params’: trainable_params(l), ‘lr’:0} for l in split_groups])
32 return self.create(opt_func, self.lr, layer_groups, wd=self.wd, true_wd=self.true_wd, bn_wd=self.bn_wd)

NameError: name ‘split_bn_bias’ is not defined

I looked at the fastai source code, and it does not seem to be defined anywhere.
However, it is defined in the old fastai source code.

Anyone else face the same issue?

Developer chat
(Pierre Guillou) #22

Hi @pushkarneo, I’ve got the same error with fastai 1.0.43. Did you solve it?

(Pushkar) #23

Hi @pierreguillou, I did not.
I couldn’t spend much time on it.
But since the code for that function is present in old source code.
Try copying and pasting it in the new source code and running.

If you do try the above, lemme know how it goes.

(Pierre Guillou) #24

Hi @pushkarneo, @sgugger made the correction (cf post).


Hi, I would like to recheck with you the Model0 from the Human Numbers resource.

The code for Model0 on video differs from the code in Jupiter notebook with a tiny bit:

  • if x.shape[0]>1: (in video)
  • if x.shape[1]>1: (in Jupiter notebook)

As it turns out, I may set if True:, or I may completely remove the if branches and the result will be the same.

I created the counter to check how many times we enter the branch like this:

class Model0(nn.Module):
    def __init__(self):
        self.i_h = nn.Embedding(nv,nh)  # green arrow
        self.h_h = nn.Linear(nh,nh)     # brown arrow
        self.h_o = nn.Linear(nh,nv)     # blue arrow
        self.bn = nn.BatchNorm1d(nh)
        self.counter0 =0
        self.counter1 =0
        self.counter2 =0
    def forward(self, x):
        self.counter0 +=1
        h = self.bn(F.relu(self.i_h(x[:,0])))
        if x.shape[0]>1:            
            h = h + self.i_h(x[:,1])            
            h = self.bn(F.relu(self.h_h(h)))
        if x.shape[0]>2:
            h = h + self.i_h(x[:,2])
            h = self.bn(F.relu(self.h_h(h)))
        return self.h_o(h)

As it turns out these lines:


Will return:

torch.Size([64, 3])

Any feedback?

(Pierre Guillou) #26

Hi @jeremy. In the lesson 7, you show in the lesson7-wgan.ipynb notebook how to generate fake images of bathroom by training a WGAN.

The training set you use has 303125 images and you train your GAN within 30 epochs with a lr of 2e-4.

I did try to use the exact same code with mango images from ImageNet dataset that has only 1305 images (and about 500 after cleaning).

However, even after 100 epochs, the result is bad. I guess my issue is the size of my training dataset?
With you experience, what would be the minimum size for the training dataset of a WGAN? And how to choose the right lr? Thank you.

After 100 epochs (lr = 2e-4)

After 100 epochs (lr = 2e-3)



Any feedback on this tiny little issue?