Lesson 9 Discussion & Wiki (2019)

I think it might be a good idea to integrate callback framework of fast ai with pub/sub semantics because it would enable use of the library in context of system using pub/sub. It might not be as important for training, but for inference pipeline it might enable deployment into applications using the paradigm. Robotics, complex event processing and many others use pub/sub extensively. Robotic Operating System is basically put together using topic-based pub/sub. This fact alone might warrant investing in the integration.

Thanks. I think reduce brings the result to be a single number though. I tried using you’re code snippet but i’m getting the same error. Will continue trying to fix it later. Though I think the for loop would be okay in this instance.

@jeremy I’m happy you ended up doing that :slight_smile:

1 Like

I would like to move forward with that suggestion, however I am committed with my time else where at the moment. My approach might be to create unit tests for the classes and change things as I went along but that would take some time I feel but doing that would help me understand better what is going on. So in short I can’t do that now and possibly from some of the replies I had I may not be on the right track. As I am still in learning mode and as is stated this is a years development in 5 weeks (Pytorch) teaching I will postpone till later. Thanks for the opportunity

Can someone explain , what is use of creating property in below code. I went through the definition of property and understood we need it, but can’t figure out its use in below code:

#export
class DataBunch():
def init(self, train_dl, valid_dl, c=None):
self.train_dl,self.valid_dl,self.c = train_dl,valid_dl,c

@property
def train_ds(self): return self.train_dl.dataset
    
@property
def valid_ds(self): return self.valid_dl.dataset

If I understand your question correctly…

It’s a way of not exposing the details of a data loader implementation to users of a DataBunch. Code that accesses data.train_ds and data.valid_ds will continue to work if the dataloader or DataBunch implementations change in the future.

In Lesson 9, Jeremy is showing how to build the fastai library from the ground up. Defining such a simple property might make less sense without knowing that context. However, in the actual fastai library, train_ds and valid_ds already do more than pass through a public instance variable.

I have a question about notebook 03_minibatch_training but it’s much more related to training in general.

It seems like we always set the bias weight to initially be 0. But then the update process subtracts the gradient of the bias (which was set to 0) times the learning rate. Why does this work? isn’t the gradient of 0 just 0? I feel like i’m missing something incredibly basic here.

nevermind, looks like i made a bad assumption. Looking at nn.Linear it doesn’t set the bias to zero anymore so i must have just missed when that happened in the code.

if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

Hi Will. It looks like you’ve got some residual confusion still. Subtracting along the gradient’s component of the bias means that when the bias is adjusted a little, the loss goes down. The value of the bias itself does not matter.

It’s important to distinguish between weights (multiplied) and biases (added). There is no such thing as a “bias weight”. The derivative of a bias, b, with respect to b is 1. Maybe you are confusing weight with bias, or when a zero gradient backpropagates by passing thru a zero weight upstream.

This video series helped my understanding, even after already studying the basics. Highly recommended.
https://www.youtube.com/watch?v=aircAruvnKk

2 Likes

Hello) I’m curious if someone try a better ways of init for RNNs. If so, does it help to converge faster or get better results?

friends, quick one, I’m trying to extract the weights at each epoch and store them in an array, the model is training all well, but for some reason the weights I extract with get_weights below, always have the very same numbers, no changes, even though the net trains all well, loss goes down, etc, what am I missing here? thanks a lot

def get_weights(net): return [p.data for p in net.parameters()]

def fit(epochs, model, loss_func, opt, train_dl, valid_dl):
  for epoch in range(epochs):  

    model.train()

    weightsW.append(get_weights(model))
    
    ### This 2 lines below: to test a fragment of parameters, all of them anyway stay static in the weightsW array even though the net trains all great, it's as if I cannot access the changing weights for some reason even though they are indeed changing (loss goes down in each cycle)
    test=list(model.parameters())[1]
    print(test)  
    
    
    for xb,yb in train_dl:
      pred=model(xb)
      loss=loss_func(pred,yb)  
      loss.backward()
      opt.step()
      opt.zero_grad()
      
      
    model.eval()
    with torch.no_grad():
      tot_loss, tot_acc=0.,0.
      for xb,yb in valid_dl:
        pred=model(xb)
        tot_loss+=loss_func(pred,yb)
        tot_acc+=accuracy(pred,yb)
    nv=len(valid_dl)
    print(epoch,tot_loss/nv,tot_acc/nv)
  return tot_loss/nv, tot_acc/nv

aha looks like the solution may have to do with using clone()
def get_weights(net): return [p.data.clone() for p in net.parameters()]

when using clone, the values seem to return updated, maybe that’s the key

04_callbacks notebook is missing in the repository. Where can I find it?

Sorry, back now.

Thanks

Hello!
How does self('begin_batch') work ? (notebook: 04_callbacks.ipynb )

class Runner():
    def __init__(self, cbs=None, cb_funcs=None):
        cbs = listify(cbs)
        for cbf in listify(cb_funcs):
            cb = cbf()
            setattr(self, cb.name, cb)
            cbs.append(cb)
        self.stop,self.cbs = False,[TrainEvalCallback()]+cbs

    def one_batch(self, xb, yb):
        self.xb,self.yb = xb,yb
        if self('begin_batch'): return
        self.pred = self.model(self.xb)
        if self('after_pred'): return
        self.loss = self.loss_func(self.pred, self.yb)
        if self('after_loss') or not self.in_train: return
        self.loss.backward()
        if self('after_backward'): return
        self.opt.step()
        if self('after_step'): return
        self.opt.zero_grad()

Still confused about the definitions of begin_batch, after_step etc. How are their body/bodies created after calling Runner() class:

stats = [TestCallback(), AvgStatsCallback([accuracy])]
run = Runner(cbs=stats)
run.fit(2, learn)

, previously these definitions were present inside CallbackHandler class:

class CallbackHandler():
    def __init__(self,cbs=None):
        self.cbs = cbs if cbs else []
    def begin_fit(self, learn):
        self.learn,self.in_train = learn,True
        self.learn.stop = False
        res = True
        for cb in self.cbs: res = res and cb.begin_fit(learn)
        return res

Anyone please explain how the _inner function is working in combine_scheds.I have debugged, but not getting much insights !
(notebook: 05_anneal.ipynb )

def combine_scheds(pcts, scheds):
    assert sum(pcts) == 1.
    pcts = tensor([0] + listify(pcts))
    assert torch.all(pcts >= 0)
    pcts = torch.cumsum(pcts, 0)
    def _inner(pos):
        idx = (pos >= pcts).nonzero().max()
        actual_pos = (pos-pcts[idx]) / (pcts[idx+1]-pcts[idx])
        return scheds[idx](actual_pos)
    return _inner

The key function in the Runner class is this:

    def __call__(self, cb_name):
        for cb in sorted(self.cbs, key=lambda x: x._order):
            f = getattr(cb, cb_name, None)
            if f and f(): return True
        return False

So self.('begin_batch') in this case is equivalent of self.__call__(cb_name='begin_batch') and it loops through all the callbacks and if any of them have a function called begin_batch, it calls it.

Hope that helps!

2 Likes

Thanks for the reply!
But where & how is it’s body get generated, (looking for the stuffs as we are doing previously inside CallbackHandler class) :

def begin_fit(self, learn):
        self.learn,self.in_train = learn,True
        self.learn.stop = False
        res = True

As Jeremy mentioned in the lecture, in __call__ ,

def __call__(self, cb_name):
        for cb in sorted(self.cbs, key=lambda x: x._order):
            f = getattr(cb, cb_name, None)
            if f and f(): return True
        return False

it first tries to find any attribute named cb_name ( eg. 'begin_fit') in the callback cb (e.g. TestCallback()), and store it in f if the searching is sucessful, otherwise set f to None .
Then here in this stmt,
if f and f(): return True
the function f (i.e. begin_fit) is called.

Still confused about where are we setting other attributes like learn.stop=False etc. of the function f (i.e. begin_fit)?

Running the notebook “09b_learner.ipynb”, I get an error

AttributeError: ‘Learner’ object has no attribute ‘epoch’

everytime I try to

learn.fit(1)

I can’t seem to figure out what’s wrong nor any posts related to the issue.

Any thoughts?

I get the same error:

AttributeError Traceback (most recent call last)
in

in fit(self, epochs, cbs, reset_opt)
62 self.do_begin_fit(epochs)
63 for epoch in range(epochs):
—> 64 if not self(‘begin_epoch’): self.all_batches()
65
66 with torch.no_grad():

in call(self, cb_name)
82 res = False
83 assert cb_name in self.ALL_CBS
—> 84 for cb in sorted(self.cbs, key=lambda x: x._order): res = cb(cb_name) and res
85 return res

~/fastai/course-v3/nbs/dl2/exp/nb_05b.py in call(self, cb_name)
19 def call(self, cb_name):
20 f = getattr(self, cb_name, None)
—> 21 if f and f(): return True
22 return False
23

~/fastai/course-v3/nbs/dl2/exp/nb_05b.py in begin_epoch(self)
33
34 def begin_epoch(self):
—> 35 self.run.n_epochs=self.epoch
36 self.model.train()
37 self.run.in_train=True

~/fastai/course-v3/nbs/dl2/exp/nb_05b.py in getattr(self, k)
10 _order=0
11 def set_runner(self, run): self.run=run
—> 12 def getattr(self, k): return getattr(self.run, k)
13
14 @property

AttributeError: ‘Learner’ object has no attribute ‘epoch’