This document analyzes text-to-image AI generators, focusing primarily on DALL-E 2 and comparing it with earlier models. The analysis uses three metrics: aesthetic quality, comprehension and interpretation, and creativity, revealing that DALL-E 2 outperforms earlier models but has limitations in spelling and understanding complex prompts. The conclusion suggests improvements for future AI art generators based on these findings.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
6 views
Analysis of Text-to-Image AI Generators
This document analyzes text-to-image AI generators, focusing primarily on DALL-E 2 and comparing it with earlier models. The analysis uses three metrics: aesthetic quality, comprehension and interpretation, and creativity, revealing that DALL-E 2 outperforms earlier models but has limitations in spelling and understanding complex prompts. The conclusion suggests improvements for future AI art generators based on these findings.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1
Analysis of Text-to-Image AI Generators
Ziyu Huang (Cheryl)
IPHS300 AI for the Humanities (Spring 2022) Prof Elkins and Chun, Kenyon College
Abstract Material and Methodology Results Conclusion and Recommendation
This project is an analysis of text-to-image artificial intelligence Material: Comparison with AI Art Generators from Earlier Generation: Comparing the performance of AI art generators, DALLE-2 generators. The comparison will mostly focus on the newly- 1. Twitter posts of DALL-E 2: - post-modern style: outperforms earlier generations of AI art generators on all three released DALL-E 2, but will also include two other AI art producers As I currently have no access to DALL-E 2, the only sources “Remembrance of nostalgia, surrealist painting by Dalí.” metrics. It can generate images with a high level of aesthetic quality, from earlier generations. Each AI generator will be fed the same that can be drawn from DALL-E 2 are from Twitter. This project will an accurate interpretation of the text prompt, and some creativity text prompt for the analysis. Three metrics will be used to analyze therefore collect artwork created by DALL-E 2 from Twitter posts. in blending information with style. Nonetheless, the outcome the images generated by each AI generator in response to the same The spectrum is limited to the arts and excludes photographs. In demonstrates that DALL-E 2 has several limitations. First, its text prompt. This project will utilize three metrics: aesthetic, addition, I will utilize the identical text prompts to feed the other two spelling ability is relatively poor. When asked to generate graphics comprehension and interpretation, and creativity. This project will AI art generators and evaluate the performance of the different with some text on them, typographical errors are quite probable. result in a conclusion and a recommendation for the improvement generators by comparing their outputs. Several DALL-E 2 users are also aware of this shortcoming. Second, 2. Hotpot AI Art Maker & 3. Starryai AI Art Generator (Orion) it has different levels of art style comprehension. It has a greater of future AI art generators based on a comparison of the These are open-sourced AI art generators, featuring fast Hotpot: Starryai: DALL-E 2: understanding of postmodern and contemporary art styles, performance of several AI art generators and different text prompts. generating speed (1-2 min) and superior visual quality than other Aesthetic: 6/10 Aesthetic: 8/10 Aesthetic: 9/10 C&I: 3/10 C&I: 6/10 C&I: 9/10 (closest to Dalí) especially digital art and some cartoon styles linked to popular open-source AI art generators. Creativity: 5/10 Creativity: 6/10 Creativity: 7/10 animations. According to one of the user reports, DALL-E 2 has trouble assigning specific attributes to particular characters. This Introduction Methodology: - pre-modern style: circumstance occurs when the text prompts involve two or more There is no access to the code underlying these models, thus “a hot dog in the style of a renaissance painting.” With the spring 2022 release of DALL-E 2, there is figures and indicate distinct characteristics for each figure. In all evaluation will be based on text input and output images. heightened interest in the debate of AI-generated art. In addition to some fundamental characteristics like as gender, DALL- All of the text prompts will include an indication of a certain art comparison to existing AI art generators that convert text to images, style and at least one from the subject identification and activity E 2 can easily mix up age, hairstyle/color, and clothing. Even while the revolutionary DALL-E 2 is an AI system that can generate more description. DALL-E 2 exhibits its strength in analyzing and comprehending realistic and accurate images based on the text input. Furthermore, The three metrics developed for this project are aesthetic, subjects, it cannot create satisfactory results when the text prompt DALL-E 2 can make complex artworks with only relatively brief comprehension and interpretation(C&I), and creativity. The aesthetic contains a novel subject, as stated in the same user report. text inputs. In addition to these, DALL-E 2 is capable of visually will be the formal analysis of the images produced from the The majority of these constraints can be overcome by by integrating distinct and irrelevant objects. While earlier AI perspective of human art historians. Composition, color palette, and modifying the parameters of the DALL-E 2 model. For example, the Hotpot: Starryai: DALL-E 2: generators could only produce crude and low-quality images, lines and shapes will be the primary factors for conducting the formal disparity between the amounts of accessible digital data for works Aesthetic: 2/10 Aesthetic: 6/10 Aesthetic: 9/10 DALL-E 2 has reached the State of the Art (SOTA) since its analysis. The comprehension and interpretation metric will assess the C&I: 2/10 C&I: 6/10 () C&I: 7/10 (more Baroque) of art generated throughout different eras is the primary cause of products satisfy practically all artistic requirements. accuracy with which you comprehend and interpret the text prompt Creativity: 3/10 Creativity: 5/10 Creativity: 6/10 different degrees of comprehension of art styles. The majority of Compared to Generative Adversarial Networks-based in terms of artistic style, subject matter, and iconography. The the premodern artworks are paintings or sculptures on easels. creativity will investigate the originality of combining the formal Comparison with Different Text Prompts Using DALL-E 2: Their reliance on artistic expertise and lengthy production time model (GAN), DALLE-2 is a newer model that supplants and even components of the particular art style with the narrative and - in the style of Vermeer: restricts their quantity, and many of them are damaged or excels GAN. Unlike other elementary models that rely mostly on iconography. - text prompts from left to right: destroyed. Postmodern artworks, in this case the digital arts, GAN, DALL-E 2 benefits from Contrastive Learning-Image Pre- “Ai generated 'Robot girl with a pearl earring' by Johannes Vermeer” training (CLIP) and diffusion models. The CLIP parallels the trainings require less painting or sculpting expertise and less time to execute. "Mother, by Vermeer" of the texts and images, functions like the encoder; while the Therefore, there is a disparity in the amounts of artworks created diffusion models learn to generate image by nosing and denosing Acknowledgement "Good morning, in the style of Vermeer" throughout different time periods, which persists in the DALL-E 2 the training set, function like the decoder. DALL-E 2's architecture Dickson, Ben. “Dall-e 2, the Future of AI Research, and OpenAI's Business Model.” training data. This bias in the trainning data results in various levels is to first train the CLIP model and then use it to train the diffusion TechTalks, April 11, 2022. https://bdtechtalks.com/2022/04/11/openai-dall-e-2/. of art style comprehension. However, this could be improved by models. Last but not least, the diffusion models use CLIP to altering the parameter to have more pre-modern iterations than O'Connor, Ryan. “How Dall-e 2 Actually Works.” AssemblyAI Blog. AssemblyAI post-modern iterations. construct text embeddings and generate images corresponding to Blog, April 22, 2022. https://www.assemblyai.com/blog/how-dall-e-2-actually- the text. The most notable benefit of this design is that it does not Currently, there are numerous critiques about the ethical works/. require massive amount of text-image paired data for training. In issues posed by Deepfakes created by AI art generators. However, other words, it is a model that is unsupervised or "self-supervised." Ramesh, Aditya, Prafulla Dhariwal, Alex Nichol, Casey Chu and Mark Chen. as several users have pointed out, DALL-E 2 appears to have “Hierarchical Text-Conditional Image Generation with CLIP Latents.” ArXiv Aesthetic: 8/10 Aesthetic: 9/10 Aesthetic: 8/10 deliberate flaws in its ability to generate photorealistic human faces. The self-supervised system can save a substantial amount of abs/2204.06125 (2022): n. pag. C&I: 8/10 C&I: 7/10 C&I: 9/10 human labor. At the same time, the unsupervised construct Some say that this flaw is one of DALL-E 2's defects. However, Creativity: 9/10 Creativity: 9/10 Creativity: 7/10 maximizes creativity and novelty, as the AI may discover surprising DALL-E 2 is capable of producing photorealistic images of objects Swimmer963. “What Dall-e 2 Can and Cannot Do.” LessWrong, May 1, 2022. outcomes that are never observed by humans. - DALL-E 2 generates art by combining the most distinctive and and non-human animals. Therefore, it is more plausible to believe https://www.lesswrong.com/posts/uKp6tBFStnsvrot5t/what-dall-e-2-can- recognizable features of the subject and the style. These "features" may that the flaw is an intentional attempt to prevent the creation of and-cannot-do. include facial characteristics, costumes, hairstyles, makeup, accessories, Deepfakes. One of the additional worries regarding DALL-E 2 is Wang, Zihao, Wei Liu, Qian He, Xin-ru Wu and Zili Yi. “CLIP-GEN: Language- color palettes, brushstrokes, modeling of light and shadow, compositions, that the AI art generators may lead to the unemployment of artists, Free Training of a Text-to-Image Generator with CLIP.” ArXiv abs/2203.00386 lines and shapes, etc. But here comes the question, how does DALL-E 2 particularly digital artists. DALL-E 2's exceptional s creativity can (2022): n. pag. choose which feature(s) to combine? When text prompts include the occasionally surpass human intelligence, as it can produce name of the style (or the artist's last name if the style is named after the https://twitter.com/Merzmensch/status/1522277446980091904 combinations of style and content that have never been observed artist), DALL-E 2 is more likely to select the formal stylistic features. In the https://twitter.com/bakztfuture/status/1517373091034378241 case above, when "Vermeer" appears as a style, DALL-E 2 generates by humans. However, rather of eliminating employment, AI art https://twitter.com/Merzmensch/status/1523302450047893506 work with Vermeer's distinctive sketchy brushstrokes and bluish, cold- producers are more likely to change them. For instance, AI art https://twitter.com/Dalle2Pics/status/1521217219488894977/photo/1 toned color palette. While the first does not incorporate Vermeer's generators like DALL-E 2 requires domain expertise to improve the https://twitter.com/Merzmensch/status/1523550836281937921/photo/1 perfomance. painting style.