Basic Seq2Seq Teacher Forcing Troubles

Hi @muellerzr

Referring to your post here, does the Fastai Learner expect a certain signature for the model? The model you integrated has no parameters passed into it when it’s initialized. Does Fastai expect this (no parameters) or is there a way to pass in some parameters to the model?

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

As the article mentioned, fastai expects an object, or the class instance. So you need to make an instance of your model beforehand. As a result it can have any number of parameters, as all fastai really “uses” is the forward function during training

To phrase it a bit differently, fastai is run on pytorch thus inherently it’s compatible with all Pytorch models. So I can do something like:

class Net(nn.Module):
    def __init__(self, sizea,sizeb,sizec):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, sizec)
        self.fc2 = nn.Linear(sizea, sizeb)
        self.fc3 = nn.Linear(sizeb, sizec)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Followed by:

model = Net(120,84,10)
learn = Learner(dls, model)

Does that help @goralpl? :slight_smile:

@muellerzr Yes! I overlooked the part about passing in the instantiated model. :sweat_smile:

Thank You. I will give this a try.

1 Like

@muellerzr

One more question. Does the fastai Learner expect for the forward() function to only have one input parameter? I am getting an error which appears to be pointing to the lack of the second argument.

TypeError: forward() missing 1 required positional argument: 'trg'

I am trying to utilize a seq2seq encoder/decoder model.

class Seq2Seq(nn.Module):
    def __init__(self,
                 encoder: nn.Module,
                 decoder: nn.Module):
        super().__init__()

        self.encoder = encoder
        self.decoder = decoder

    def forward(self, src: Tensor, trg: Tensor, teacher_forcing_ratio: float = 0.5) -> Tensor:

        batch_size = src.shape[1]

fastai sends only the inputs in, so teacher forcing requires a bit more work to actually get working (and something I actually want to look at).

@goralpl do you have a good example tutorial that works with this in raw Pytorch I could look at? :slight_smile:

Hmmm found one I can look at here: https://github.com/bentrevett/pytorch-seq2seq/blob/master/1%20-%20Sequence%20to%20Sequence%20Learning%20with%20Neural%20Networks.ipynb

So here is what I did thanks to the magical power of Callbacks

Since teacher/student expects the y’s to be attached, we do the following:

# post building `DataLoaders`
learn.dls.train.n_inp = 2
class TeacherForcingCallback(Callback):
    """
    Callback that sends the y's to the model too
    """
    def before_batch(self):
        x,y = self.x
        self.learn.yb = (y.unsqueeze(0))

learn = Learner(dls, model, loss_func=criterion, cbs=[TeacherForcingCallback()])

You can also get even fancier and set your teacher_forcing_ratio and simply override the batch (self.xb) to include it:

class TeacherForcingCallback(Callback):
    "Callback that sends the y's to the model too"
    def __init__(self, teacher_forcing_ratio=0.5):
        self.teacher_forcing_ratio = teacher_forcing_ratio
    def before_batch(self):
        x,y = self.x
        self.xb = (x,y,self.teacher_forcing_ratio)
        self.learn.yb = (y.unsqueeze(0))

Callbacks are a whole different ballgame so sorry if that might confuse you some @goralpl :slight_smile:

@muellerzr

Thanks for information on the callbacks. I’ll have to read up on them. I’m trying to implement what you suggested but it looks like I’m not able to set that attribute.

I created an end-to-end example of the seq2seq use case with a toy dataset. Can you please take a look? I’m sure other folks will benefit from this discussion and I’m happy to share these notebooks with anyone who cares.

dls.train.n_inp=2

class TeacherForcingCallback(Callback):
    """
    Callback that sends the y's to the model too
    """
    def before_batch(self):
        x,y = self.x
        self.learn.yb = (y.unsqueeze(0))

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-b9d418192d2f> in <module>
      2 # learn.dls.train.n_inp = 2
      3 
----> 4 dls.train.n_inp=2
      5 
      6 class TeacherForcingCallback(Callback):

AttributeError: can't set attribute

There’s a few issues with the approach here. Technically given what we’re doing our Callback can be simplified further:

class TeacherForcingCallback(Callback):
    """
    Callback that sends the y's to the model too
    """
    def before_batch(self):
        x,y = self.x, self.y
        self.learn.xb = (x,y)

However you forgot to define the outputs of your model, so we need:

            input = trg[t] if teacher_force else top1
        
        return outputs

Another issue is CrossEntropyLoss won’t particularly work here, we can see that just with:

x,y = dls.one_batch()
with torch.no_grad():
    out = model(x,y)
criterion(out,y)

(I don’t particularly know enough here to recommend what to do)

Edit:

Okay @goralpl, if we use CrossEntropyLossFlat() instead as our loss function it’ll completely work :smiley: (we need to flatten the outputs, hence why CELF is needed rather than just nn.CrossEntropyLoss, otherwise we’d need to do some preprocessing to get it working with the loss function :slight_smile:

Thanks @muellerzr I posted an updated Jupyter Notebook with your solution. I’m looking forward to your livestream on 02/19/2021!

1 Like