AttributeError: 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask'

I am running the [“Stable Diffusion Deep Dive” notebook from Jonathan Whitaker] (Google Colab). This is part of the Lesson 9, Part II 2022/23 class notebook.

When I run the following cell:

def get_output_embeds(input_embeddings):
    # CLIP's text model uses causal mask, so we prepare it here:
    bsz, seq_len = input_embeddings.shape[:2]
    causal_attention_mask = text_encoder.text_model._build_causal_attention_mask(bsz, seq_len, dtype=input_embeddings.dtype)

    # Getting the output embeddings involves calling the model with passing output_hidden_states=True
    # so that it doesn't just return the pooled final predictions:
    encoder_outputs = text_encoder.text_model.encoder(
        inputs_embeds=input_embeddings,
        attention_mask=None, # We aren't using an attention mask so that can be None
        causal_attention_mask=causal_attention_mask.to(torch_device),
        output_attentions=None,
        output_hidden_states=True, # We want the output embs not the final output
        return_dict=None,
    )

    # We're interested in the output hidden state only
    output = encoder_outputs[0]

    # There is a final layer norm we need to pass these through
    output = text_encoder.text_model.final_layer_norm(output)

    # And now they're ready!
    return output

out_embs_test = get_output_embeds(input_embeddings) # Feed through the model with our new function
print(out_embs_test.shape) # Check the output shape
out_embs_test # Inspect the output

I got the following error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-70-dbb74b7ec9b4> in <cell line: 26>()
     24     return output
     25 
---> 26 out_embs_test = get_output_embeds(input_embeddings) # Feed through the model with our new function
     27 print(out_embs_test.shape) # Check the output shape
     28 out_embs_test # Inspect the output

1 frames
<ipython-input-70-dbb74b7ec9b4> in get_output_embeds(input_embeddings)
      2     # CLIP's text model uses causal mask, so we prepare it here:
      3     bsz, seq_len = input_embeddings.shape[:2]
----> 4     causal_attention_mask = text_encoder.text_model._build_causal_attention_mask(bsz, seq_len, dtype=input_embeddings.dtype)
      5 
      6     # Getting the output embeddings involves calling the model with passing output_hidden_states=True

/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in __getattr__(self, name)
   1693             if name in modules:
   1694                 return modules[name]
-> 1695         raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
   1696 
   1697     def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:

AttributeError: 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask'

I googled on the “_build_causal_attention_mask()” method but not getting useful information for solving the error.

It seems the above method is no longer with CLIPTextTransformer. Are there a replacement method so that this cell can run properly?

Oh, this was issue #37: Deep Dive NB: Quick Fix for AttributeError: 'CLIPTextTransformer' object has no attribute '_build_causal_attention_mask' · Issue #37 · fastai/diffusion-nbs · GitHub. The solution is there.

2 Likes

thanks for linking the solution here!