Concepts like textual inversion or dreambooth to me are possibly the most exciting and powerful extensions to stable diffusion. Being able to inject custom representations (with only 3-5 images!) into the text-to-image model and optimizing towards seemingly any novel concept provides incredible control over content generation.
This subject is really interesting, given the vast amount of links here and within links to further papers/ pages perhaps a score process could be defined to indicate the ones that explain the subject best in the simplest terms, for me so far I give a vote for ‘Lilian Weng’, the ‘Yang Song’ intrigues me by the title only so far so next on my reading list.
After reading Yang Song paper I give that a vote also, note the first link in Jeremy’s original post is from that paper. In this paper Yang Song writes a commentary on the connections to different models in this sphere of knowledge.
I must comment that I am new to this topic and my suggestions are only that. Each link above reveals the number of it’s access clicks but does not say how useful the experience has been to the reader.
I found this twitter thread particularly useful at giving me a good high level view of SD, and particularly around the “latent diffusion” idea, together with this other which is linked in the original one.
It gave me a good idea of the pieces involved and a basis for what the endgame is
After seeing this tweet by Tanishq I knew something had to be done…
Using the HF SD Dreambooth training collab I added Jeremy as a new concept to SD. It took just 5 images of Jeremy that I found online, < 5 mins of total training time on a V100, and < 10s per image to generate these.