Fastai v2 text

chess · January 4, 2022, 1:10pm

I haven’t revisited this since my last post. Let us know if you get it working!

andres · January 4, 2022, 2:35pm

Thanks for letting me know.
I guess I better rethink my plan to migrate, so I’m probably not going to work on it at the moment.
I figured v2 would be more future-proof with the latest and greatest functionality, but might be that there are also some other things I’d have to reimplement just to keep my current functionality, so it might make more sense to keep running what I have for now and commit my efforts elsewhere. For some reason, I thought v2 would have got all of v1 functionality by now, but looks like there is not so much activity here any more.

andres · January 4, 2022, 2:40pm

In the end I could not help myself and tried to get it working. The last code you had produced clearly nonsense output, but when I put the grad back in, the results seemed reasonable. This is the code I used:

def intrinsic_attention(self, learn, text, class_id=None):
      """
      Calculate the intrinsic attention of the input w.r.t to an output `class_id`, 
      or the classification given by the model if `None`.
      """
      learn.model.train()
      _eval_dropouts(learn.model)
      learn.model.zero_grad()
      learn.model.reset()
      dl = learn.dls.test_dl([text])
      batch = dl.one_batch()[0]
      emb = learn.model[0].module.encoder(batch).detach().requires_grad_(True)
      lstm = learn.model[0].module(emb, True)
      learn.model.eval()
      cl = learn.model[1]((lstm, torch.zeros_like(batch).bool(),))[0].softmax(dim=-1)
      if class_id is None: class_id = cl.argmax()
      cl[0][class_id].backward()
      attn = emb.grad.squeeze().abs().sum(dim=-1)
      attn /= attn.max()
      tok, _ = learn.dls.decode_batch((*tuplify(batch), *tuplify(cl)))[0]
      return tok, attn

I’m not sure what you issue with grad was caused by, but for me it seems not to be an issue…

florianl · January 5, 2022, 7:50pm

I had this issue a while ago in the fastinference library.

I’ve added the following line to fix the issue:

    emb = learn.model[0].module.encoder(batch).detach().requires_grad_(True)
    emb.retain_grad() <----
    lstm = learn.model[0].module(emb, True)

Without retain_grad() the results were the same for every class.

github.com

muellerzr/fastinference/blob/7b385eb83402f89b3bb1c9ea450fb963326d9476/fastinference/inference/text.py#L143

    
      
                  spans.append(f'<span title="{a:.3f}" style="background-color: rgba{c};">{p}</span>')
              html_code.append(sep.join(spans))
              html_code.append('</span>')
              return ''.join(html_code)
          
          
def _show_piece_attn(*args, **kwargs):
              from IPython.display import display, HTML
              display(HTML(_piece_attn_html(*args, **kwargs)))
          
          
# Cell
          def _intrinsic_attention(learn, text, class_id=None):
              "Calculate the intrinsic attention of the input w.r.t to an output `class_id`, or the classification given by the model if `None`."
              learn.model.train()
              _eval_dropouts(learn.model)
              learn.model.zero_grad()
              learn.model.reset()
              dl = learn.dls.test_dl([text])
              batch = next(iter(dl))[0]
              emb = learn.model[0].module.encoder(batch).detach().requires_grad_(True)
              emb.retain_grad()
              lstm = learn.model[0].module(emb, True)

andres · January 6, 2022, 11:38am

Hmmm…I tried adding this line and the results are exactly the same as before - seems fine to me either way.