Share your work here ✅ (Part 2 2022)

Wow this works really well :+1:

cc @poppingtonic

I gave this a go.
Yes. It changed the result a bit I’m not sure it’s better or worse.

Get the full notebook from here.

With different seeds

With multiple prompts

3 Likes

this reminds me a bit of the DiffEdit paper where they used 2 contrary prompts (horse vs zebra) as a clever way to find the inverse of the noise for the target class. I wonder if the reversed prompt is creating a similar effect except that it looks at essentially the same features but in a slightly different way (basically a “horizontal flip” of text), adding in new details to the final result.

This also makes me think maybe a series of “augmented prompts” could be used to create something like a TTA effect but for diffusion models?? Would be interesting to try!

3 Likes

I was curious about this and did some further testing … Do note that your code doesn’t simply reverse the prompt — it reverses it and also puts spaces between the characters :slight_smile: See screenshot below for the difference in how it works (at least as per the screenshots you shared …)

So, I decided to try both versions of the negative prompt and see what happened with each … Here’s the three resulting images …



I can’t tell which one is more detailed, but my wife who has a better eye than I, says that image 2 (the one with the spaces between each letter) is the one which is the most detailed … I just have no idea why :slight_smile:

4 Likes

Hi there,

I built a little tool based on hacky stablediffusion code for generating videos · GitHub to generate moving paintings from some of my favorite artists. Some of the results are nice.

I Particularly like this one based on Chagall. Almost generated a storyline :wink:

Basquiat:

Dali:

Bosch:

Kandinsky:

Miro:

Matisse:

The full youtube playlist is here

9 Likes

Yeah i noticed that too, i’m trying to see if it is due the attention mask for the second prompt that is causing this to happen.
Instead of attending to a few tokens it is now attending a lot more tokens now.
Even if you remove the spaces "".join(reversed(str(prompt))) it still performs better, again here i see a lot more tokens and hence a lot more attention tokens set to 1.

1 Like

Yep, just with the empty string (but reversed) the token count doubles (at least for that particular prompt), and if you take the version with the space, then the token count doubles again (prompt — 12, no space — 24, spaced — 50). But I’m unable to intuit how that makes the output more detailed …

But now that I’ve started digging into this, I can’t stop :stuck_out_tongue:

1 Like

From my understanding of this, the reversed prompt being garbled nonsense corresponds to some random image thus pushing the image generation more toward the given prompt.

Example

prompt: an watercolor painting of an astronaut at a beach
reversed prompt: hcaeb a ta tuanortsa na fo gnitniap rolocretaw na

Image generated using just the prompt

Image generated using just the reversed prompt

Image generated using the prompt and the reversed prompt as negative prompt

Here is the notebook link if anyone’s interested in trying it out.

9 Likes

Negative Prompting with Strikethrough text

After seeing the discussion on the forum, I got the idea of doing negative prompting by using strikethrough text for the negative words, and feeding the sentence as a normal prompt.

the results are surprisingly good! The overall structure of the original prompt image is almost conserved in most cases.

examples:

prompt = ["a dog wearing a blue hat, painting by grant wood"]
neg_prompt = ["a dog wearing a b̶l̶u̶e̶  hat, painting by grant wood "]


prompt = ["a photograph of a golden retriever puppy wearing a red collar "]
neg_prompt = ["a photograph of a golden retriever puppy wearing a r̶e̶d̶ collar "]


prompt = ["a photograph of a yellow butterfly with green spots on its wings"]
neg_prompt = ["a photograph of a yellow butterfly with g̶r̶e̶e̶n̶ spots on its wings"]


prompt = ["a photograph of a persian cat playing with a blue ball"]
neg_prompt = ["a photograph of a persian cat playing with a b̶l̶u̶e̶ ball"]


prompt = ["a photograph of a dry leaf"]
neg_prompt = ["a photograph of a d̶r̶y̶  leaf"]


10 Likes

Very interesting! Would be curious to see how strikethrough text prompts behave if you use them as the normal prompt. Or how the strikethrough negative prompt outputs differ from the negative prompt outputs without the word struck out

2 Likes

thanks :slight_smile: , the sentences containing strikethrough words are directly fed as a normal prompt

got it. This was really clever, thanks for sharing! :slight_smile:
I wonder if making something bold or underlining or using ( parenthesis ) or “quotations” or exclamations!! with a word will weigh that word higher and impact the visual. CLIP may have caught on to that detail if that got encoded in its text input. Going to experiment with that! :scientist:

1 Like

I tried reproducing your example, but it doesn’t seem to work accordingly.

I experimented with this example:

normal prompt: a photograph of a dry leaf
strikethrough prompt: a photograph of a d̶r̶y̶ leaf

Image generated using normal prompt as input prompt

Image generated using strikethrough prompt as input prompt

The output image is very similar to the output of the normal prompt.

Image generated using normal prompt as input prompt and strikethrough prompt as negative prompt

The output image doesn’t look like a dry leaf. But this is to be expected because the prompt and negative prompt outputs were very similar so they canceled out each other, resulting in this.

Is there anything I’m overlooking here?

Here is the notebook link for the example above.

1 Like

Oh that’s strange, it’s giving me consistent results in my experiments. I’ll share my notebook in a while.

very cool!
regarding creating videos with multiple prompts that move along a story, I love the Phenaki examples:
https://phenaki.video/

of course I think they use a mixed dataset, part of it videos and part image-text pairs (or at least thats the approach of the next evolution of Imagen for video), but the paper may give some clues that could be applied also outside of those kinds of datasets

1 Like

For this base case, are you using classifier free guidance? If so, what guidance scale are you using? Perhaps just tuning the guidance scale will improve things?

i looked into your nb and i guess the way you are “strikingthroughing” the text is different from how I have done it. I use this link to get the strikethrough words. let me know if this makes a difference to your experiments.

my notebook: link

I was using the default guidance scale of 7.5

I experimented with a few different guidance scales, here are the results:

Guidance Scale: 5

Guidance Scale: 7.5 (Default)

Guidance Scale: 10

Guidance Scale: 14

Adjusting the guidance scale doesn’t seem to improve the images significantly.

I tried using the strikethrough words from the link, but the results are the same.
Strangely, when I tried my prompt in your notebook it seems to work nicely. Maybe it is because I’m using the pipeline directly; I’ll have to look into the source code for this.

Following up on the strikethrough experiment; I tried something with an interesting result.

Prompt: a photograph of a golden retriever puppy wearing a red collar

Now using a prompt with a strikethrough
Prompt: a photograph of a golden retriever puppy wearing a r̶e̶d̶ collar

Now removing the strikethrough word entirely
Prompt: a photograph of a golden retriever puppy wearing a collar

The image generated is very similar to the image generated using strikethrough.
It seems the strikethrough word doesn’t really affect the image generation process.

1 Like