Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

Chen, Qi; Deng, Chaorui; Huang, Zixiong; Zhang, Bowen; Tan, Mingkui; Wu, Qi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.08525 (cs)

[Submitted on 16 Aug 2023]

Title:Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

Authors:Qi Chen, Chaorui Deng, Zixiong Huang, Bowen Zhang, Mingkui Tan, Qi Wu

View PDF

Abstract:Text-to-image synthesis has made encouraging progress and attracted lots of public attention recently. However, popular evaluation metrics in this area, like the Inception Score and Fr'echet Inception Distance, incur several issues. First of all, they cannot explicitly assess the perceptual quality of generated images and poorly reflect the semantic alignment of each text-image pair. Also, they are inefficient and need to sample thousands of images to stabilise their evaluation results. In this paper, we propose to evaluate text-to-image generation performance by directly estimating the likelihood of the generated images using a pre-trained likelihood-based text-to-image generative model, i.e., a higher likelihood indicates better perceptual quality and better text-image alignment. To prevent the likelihood of being dominated by the non-crucial part of the generated image, we propose several new designs to develop a credit assignment strategy based on the semantic and perceptual significance of the image patches. In the experiments, we evaluate the proposed metric on multiple popular text-to-image generation models and datasets in accessing both the perceptual quality and the text-image alignment. Moreover, it can successfully assess the generation ability of these models with as few as a hundred samples, making it very efficient in practice.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.08525 [cs.CV]
	(or arXiv:2308.08525v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.08525

Submission history

From: Chaorui Deng [view email]
[v1] Wed, 16 Aug 2023 17:26:47 UTC (1,578 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual and Semantic Credit Assignment

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators