Pipe(prompt).images[0] gives error

jesper.moeslund · October 18, 2022, 7:39am

When I run pipe(prompt).images[0] in the notebook i get the error: “RuntimeError: expected scalar type Half but found Float”. I am aware of Lesson 9 official topic - #66 by bipin and have therefore installed version 0.4.2 of diffusers, but that does not make a difference. I haven’t changed anything else in the stable_diffusion.ipynb notebook. Hope someone can point me to a solution?

Full error stack:

RuntimeError Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 pipe(prompt).images[1]

File /usr/local/lib/python3.9/dist-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.call..decorate_context(*args, **kwargs)
24 @functools.wraps(func)
25 def decorate_context(*args, **kwargs):
26 with self.clone():
—> 27 return func(*args, **kwargs)

File /usr/local/lib/python3.9/dist-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py:219, in StableDiffusionPipeline.call(self, prompt, height, width, num_inference_steps, guidance_scale, negative_prompt, num_images_per_prompt, eta, generator, latents, output_type, return_dict, callback, callback_steps, **kwargs)
214 logger.warning(
215 “The following part of your input was truncated because CLIP can only handle sequences up to”
216 f" {self.tokenizer.model_max_length} tokens: {removed_text}"
217 )
218 text_input_ids = text_input_ids[:, : self.tokenizer.model_max_length]
→ 219 text_embeddings = self.text_encoder(text_input_ids.to(self.device))[0]
221 # duplicate text embeddings for each generation per prompt
222 text_embeddings = text_embeddings.repeat_interleave(num_images_per_prompt, dim=0)

File /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don’t have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = ,

File /usr/local/lib/python3.9/dist-packages/transformers/models/clip/modeling_clip.py:726, in CLIPTextModel.forward(self, input_ids, attention_mask, position_ids, output_attentions, output_hidden_states, return_dict)
698 @add_start_docstrings_to_model_forward(CLIP_TEXT_INPUTS_DOCSTRING)
699 @replace_return_docstrings(output_type=BaseModelOutputWithPooling, config_class=CLIPTextConfig)
700 def forward(
(…)
707 return_dict: Optional[bool] = None,
708 ) → Union[Tuple, BaseModelOutputWithPooling]:
709 r"“”
710 Returns:
711
(…)
724 >>> pooled_output = outputs.pooler_output # pooled (EOS token) states
725 ```“”"
→ 726 return self.text_model(
727 input_ids=input_ids,
728 attention_mask=attention_mask,
729 position_ids=position_ids,
730 output_attentions=output_attentions,
731 output_hidden_states=output_hidden_states,
732 return_dict=return_dict,
733 )

File /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don’t have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = ,

File /usr/local/lib/python3.9/dist-packages/transformers/models/clip/modeling_clip.py:647, in CLIPTextTransformer.forward(self, input_ids, attention_mask, position_ids, output_attentions, output_hidden_states, return_dict)
643 if attention_mask is not None:
644 # [bsz, seq_len] → [bsz, 1, tgt_seq_len, src_seq_len]
645 attention_mask = _expand_mask(attention_mask, hidden_states.dtype)
→ 647 encoder_outputs = self.encoder(
648 inputs_embeds=hidden_states,
649 attention_mask=attention_mask,
650 causal_attention_mask=causal_attention_mask,
651 output_attentions=output_attentions,
652 output_hidden_states=output_hidden_states,
653 return_dict=return_dict,
654 )
656 last_hidden_state = encoder_outputs[0]
657 last_hidden_state = self.final_layer_norm(last_hidden_state)

File /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don’t have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = ,

File /usr/local/lib/python3.9/dist-packages/transformers/models/clip/modeling_clip.py:578, in CLIPEncoder.forward(self, inputs_embeds, attention_mask, causal_attention_mask, output_attentions, output_hidden_states, return_dict)
571 layer_outputs = torch.utils.checkpoint.checkpoint(
572 create_custom_forward(encoder_layer),
573 hidden_states,
574 attention_mask,
575 causal_attention_mask,
576 )
577 else:
→ 578 layer_outputs = encoder_layer(
579 hidden_states,
580 attention_mask,
581 causal_attention_mask,
582 output_attentions=output_attentions,
583 )
585 hidden_states = layer_outputs[0]
587 if output_attentions:

File /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don’t have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = ,

File /usr/local/lib/python3.9/dist-packages/transformers/models/clip/modeling_clip.py:321, in CLIPEncoderLayer.forward(self, hidden_states, attention_mask, causal_attention_mask, output_attentions)
318 residual = hidden_states
320 hidden_states = self.layer_norm1(hidden_states)
→ 321 hidden_states, attn_weights = self.self_attn(
322 hidden_states=hidden_states,
323 attention_mask=attention_mask,
324 causal_attention_mask=causal_attention_mask,
325 output_attentions=output_attentions,
326 )
327 hidden_states = residual + hidden_states
329 residual = hidden_states

File /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
1126 # If we don’t have any hooks, we want to skip the rest of the logic in
1127 # this function, and just call forward.
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = ,

File /usr/local/lib/python3.9/dist-packages/transformers/models/clip/modeling_clip.py:260, in CLIPAttention.forward(self, hidden_states, attention_mask, causal_attention_mask, output_attentions)
256 attn_weights_reshaped = None
258 attn_probs = nn.functional.dropout(attn_weights, p=self.dropout, training=self.training)
→ 260 attn_output = torch.bmm(attn_probs, value_states)
262 if attn_output.size() != (bsz * self.num_heads, tgt_len, self.head_dim):
263 raise ValueError(
264 f"attn_output should be of size {(bsz, self.num_heads, tgt_len, self.head_dim)}, but is"
265 f" {attn_output.size()}"
266 )

RuntimeError: expected scalar type Half but found Float

joshua-paperspace · October 18, 2022, 1:01pm

Hi @jesper.moeslund,

Try updating the transformers library as well as diffusers with pip install -U .... This resolved the issue for me.

Another workaround, although not sure if it’s advised, is when you are downloading the Stable Diffusion pipeline, to change the torch_dtype to float32 from float16. Example: torch_dtype=torch.float32.

Hope this helps!

jesper.moeslund · October 19, 2022, 7:32am

Thanks a lot! Your second solution seems to work! Great!