Image Composition Assessment With Saliency-Augmented Multi-Pattern Pooling
Image Composition Assessment With Saliency-Augmented Multi-Pattern Pooling
Abstract
Image composition assessment is crucial in aesthetic assessment, which aims to as-
sess the overall composition quality of a given image. However, to the best of our knowl-
edge, there is neither dataset nor method specifically proposed for this task. In this paper,
we contribute the first composition assessment dataset CADB with composition scores
for each image provided by multiple professional raters. Besides, we propose a compo-
sition assessment network SAMP-Net with a novel Saliency-Augmented Multi-pattern
Pooling (SAMP) module, which analyses visual layout from the perspectives of multiple
composition patterns. We also leverage composition-relevant attributes to further boost
the performance, and extend Earth Mover’s Distance (EMD) loss to weighted EMD loss
to eliminate the content bias. The experimental results show that our SAMP-Net can
perform more favorably than previous aesthetic assessment approaches.
1 Introduction
Image aesthetic assessment aims to judge aesthetic quality automatically in a qualitative or
quantitative way, which can be widely used in many down-stream applications such as as-
sisted photo editing, intelligent photo album management, image cropping, and smartphone
photography [5, 7, 11, 39, 40, 41, 43, 51]. Among the factors related to image aesthetics,
image composition, which mainly concerns the arrangement of the visual elements inside
the frame [38], is very critical in estimating image aesthetics [28, 36, 44], because compo-
sition directs the attention of viewer and has a significant impact on the aesthetic perception
[12, 34, 38].
Despite the importance of image composition, there is no dataset readily available for
image composition assessment. Some existing aesthetic datasets contain annotations related
to image composition [3, 19, 22, 35]. However, they only have composition-relevant at-
tributes without overall composition score except for PCCD dataset [3], but PCCD dataset
only presents one reviewer’s composition rating for each image and this reviewer, an anony-
mous website visitor, may be unprofessional. So the ratings might be biased and inaccurate,
which are far below the requirement for scientific evaluation. To this end, we contribute
∗ Corresponding author.
© 2021. The copyright of this document resides with its authors.
It may be distributed unchanged freely in print or electronic forms.
2 ZHANG, NIU, ZHANG: IMAGE COMPOSITION ASSESSMENT WITH SAMP
a new image Composition Assessment DataBase (CADB) on the basis of Aesthetics and
Attributes DataBase (AADB) dataset [22]. Our CADB dataset contains 9,497 images with
each image rated by 5 individual raters who specialize in fine art for the overall composition
quality. The details of our CADB dataset will be introduced in Section 3.
To the best of our knowledge, there
is no method specifically designed for im-
age composition assessment. However,
some previous aesthetic assessment meth-
ods also take composition into considera-
tion. We divide the existing composition-
relevant approaches into two groups. 1)
The composition-preserving methods [4,
32] can maintain image composition during
both training and testing. However, these
approaches fail to extract composition-
Figure 1: Evaluating composition quality
relevant feature for composition assessment
from the perspectives of different composition
task. 2) The composition-aware approaches
patterns. The first (resp., second) row shows a
[28, 31, 52] extract composition-relevant
good example and a bad example considering
feature by modeling the mutual dependen-
symmetrical (resp., radial) balance.
cies between all pairs of objects or regions
in the image. However, redundant and noisy information is likely to be introduced dur-
ing this procedure, which may adversely affect the performance of composition assessment.
Moreover, there are some previous methods [1, 10, 29, 49, 54, 55] designed to model the
well-established photographic rules (e.g., rule of thirds and golden ratio [20]), which hu-
mans use in evaluating image composition quality. However, these rule-based methods have
two major limitations: 1) The hand-crafted feature extraction is tedious and laborious com-
pared with deep learning features [27]. 2) Each rule is valid only for specific scenes and they
did not consider which rules are applicable for a given scene [47].
Interestingly, composition pattern, as an important aspect of composition assessment, is
not explicitly considered by the above methods. As shown in Figure 1, each composition
pattern divides the holistic image into multiple non-overlapping partitions, which can model
human perception of composition quality. In particular, by analyzing the visual layout (e.g.,
positions and sizes of visual elements) according to composition pattern, i.e., comparing the
visual elements in various partitions, we can quantify the aesthetics of visual layout in terms
of visual balance (e.g., symmetrical balance and radial balance) [18, 23, 30], composition
rules (e.g., rule of thirds, diagonals and triangles) [24, 50], and so on. Different composition
patterns offer different perspectives to evaluate composition quality. For example, the com-
position pattern in the top (resp., bottom) row in Figure 1 can help judge the composition
quality in terms of symmetrical (resp., radial) balance.
To dissect visual layout based on different composition patterns, we propose a novel
multi-pattern pooling module at the end of backbone to integrate the information extracted
from multiple patterns, in which each pattern provides a perspective to evaluate the compo-
sition quality. Considering that the sizes and locations of salient objects are representative of
visual layout and fundamental to image composition [30], we further integrate visual saliency
[17] into our multi-pattern pooling module to encode the spatial and geometric information
of salient objects, leading to our Saliency-Augmented Multi-pattern Pooling (SAMP) mod-
ule. Additionally, since some composition patterns may play more important roles, we de-
sign weighted multi-pattern aggregation to fuse multi-pattern features, which can adaptively
ZHANG, NIU, ZHANG: IMAGE COMPOSITION ASSESSMENT WITH SAMP 3
Figure 2: The overall pipeline of our SAMP-Net for composition assessment. We use
ResNet18 [14] as backbone. The detailed structure of our Saliency-Augmented Multi-pattern
Pooling (SAMP) module and Attentional Attribute Feature Fusion (AAFF) module are illus-
trated in Figure 3 and Figure 4 respectively.
assign different weights to different patterns.
Moreover, because our dataset is built upon AADB dataset [22] with composition-relevant
attributes, we further leverage composition-relevant attributes to boost the performance of
composition assessment. Specifically, we propose an Attentional Attribute Feature Fusion
(AAFF) module to fuse composition feature and attribute feature. Finally, after noticing the
content bias existing in our dataset, that is, composition score distribution is severely influ-
enced by object category, we extend Earth Mover’s Distance (EMD) loss in [15] to weighted
EMD loss to eliminate the content bias.
The main contributions of this paper can be summarized as follows: 1) We contribute the
first image composition assessment dataset CADB, in which each image has the composition
scores annotated by five professional raters. 2) We propose a novel composition assessment
method with Saliency-Augmented Multi-pattern Pooling (SAMP) module. 3) We investigate
the effectiveness of auxiliary attributes and weighted EMD loss for composition assessment.
4) Our model outperforms previous aesthetic assessment methods on our dataset.
2 Related Work
2.1 Aesthetic Assessment Dataset
Many large-scale aesthetic assessment datasets have been collected in recent years, like Aes-
thetic Visual Analysis database (AVA) [35], AADB [22], Photo Critique Captioning Dataset
(PCCD) [3], AVA-Comments [60], AVA-Reviews [53], FLICKER-AES [42], and DPC-
Captions [19]. However, these datasets only have composition-relevant attributes without
overall composition score, or only have one inaccurate composition score for each image,
which are far below the requirement for composition assessment research. Unlike the ex-
isting aesthetic datasets, our CADB dataset contains composition ratings assigned to each
image by multiple professional raters. Besides, we guarantee the reliability of our dataset
based on sanity check and consistency analysis (see Section 3).
assessment task. The deep learning based methods can be divided into two groups. The
composition-preserving approaches [4, 32], without explicitly learning composition repre-
sentations, produce inferior results on composition evaluation task. The composition-aware
approaches [28, 31, 52] consider the relationship between all pairs of objects or regions in
the image for modeling image composition, which is likely to introduce redundant and noisy
information. Moreover, the above methods did not explicitly consider composition patterns.
In contrast, we design a novel Saliency-Augmented Multi-pattern Pooling (SAMP) module,
which provides an insightful and effective perspective for evaluating composition quality.
images, in which the test set is made less biased for better evaluation (see Supplementary).
4 Methodology
To accomplish the composition assessment task, we propose a novel network SAMP-Net,
which is named after Saliency-Augmented Multi-pattern Pooling (SAMP) module. The
overall pipeline of our method is illustrated in Figure 2, where we first extract the global
feature map from input image by backbone (e.g., ResNet18 [14]) and then yield aggregated
pattern feature through our SAMP module, which is followed by Attentional Attribute Fea-
ture Fusion (AAFF) module to fuse the composition feature and attribute feature. After that,
we predict composition score distribution based on the fused feature and predict the attribute
score based on the attribute feature, which are supervised by weighted EMD loss and Mean
Squared Error (MSE) loss respectively.
Table 1: Ablation studies of different components in our model. † means Spatial Pyramid
Pooling (SPP) [13]. ‡ means Multi-scale Pyramid Pooling (MPP) [56]. WE means weighted
EMD loss. MP means multi-pattern pooling. PW means pattern weights. SA means saliency-
augmented. AF indicates attribute feature and AA indicates attentional attribute feature fu-
sion.
(see Supplementary), which assigns smaller weights to biased samples when calculating
EMD Loss. Finally, our SAMP-Net can be trained in an end-to-end manner with attribute
prediction loss Latts and weighted EMD loss LwEMD :
5 Experiments
5.1 Implementation Details and Evaluation Metric
We use ResNet18 [14] pretrained on ImageNet [8] as the backbone of our SAMP-Net. Unless
otherwise specified, all input images are resized to 224 × 224 for both training and testing
following [21, 26, 45], leading to a global feature map of H × W = 7 × 7, and the saliency
map is downsampled to Hsal ×Wsal = 56 × 56 before passing to the SAMP. More details can
be found in Supplementary. All experiments are conducted on our CADB dataset.
To evaluate the composition score distribution and composition mean score predicted
by different models, it is natural to adopt EMD and MSE as the evaluation metrics. EMD
measures the closeness between the predicted and ground-truth composition score distribu-
tions as in [15]. MSE is computed between the predicted and ground-truth composition mean
scores. Moreover, following existing aesthetic assessment approaches [4, 22, 48], we also re-
port the ranking correlation measured by Spearman’s Rank Correlation Coefficient (SRCC)
and the linear association measured by Linear Correlation Coefficient (LCC) between the
predicted and ground-truth composition mean scores.
Table 2: Comparison of different methods on the composition assessment task. All models
are trained and evaluated on the proposed CADB dataset.
Weighted EMD Loss: We start from basic ResNet18 [14], and report the results using EMD
loss and weighted EMD loss in Table 1. The experimental results show that training with
weighted EMD loss (row 2) performs better than standard EMD loss (row 1) with a clear
gap of test EMD between these two models, which is attributed to the advantage of weighted
EMD loss in eliminating content bias.
Saliency-Augmented Multi-pattern Pooling (SAMP): Based on ResNet18 with weighted
EMD loss (row 2), we add our SAMP module and also explore its ablated versions. We
first investigate vanilla multi-pattern pooling without saliency or pattern weights (row 3), in
which saliency vector is excluded from partition feature and the pattern features of multiple
patterns are simply averaged. Then, we learn pattern weights to aggregate multiple pattern
features (row 4). By comparing row 3 and row 4, it is beneficial to adaptively assign different
weights to different pattern features. We further incorporate saliency map into SAMP module
(row 5). The comparison between row 4 and row 5 proves that is useful to emphasize the
layout information of salient objects. Considering the architecture similarity between Spatial
Pyramid Pooling (SPP) [13] and our multi-pattern pooling, we replace our multi-pattern
pooling with SPP using scales {1 × 1, 2 × 2, 3 × 3} following [4] (row 6). In addition, we
also show the results of using Multi-scale Pyramid Pooling (MPP) [56] in row 7, in which
we make an image pyramid containing three scaled images. The comparisons (row 5 v.s.
row 6, row 5 v.s. row7) show that the model using multi-pattern pooling outperforms both
SPP and MPP. The reason is that our proposed multi-pattern pooling is specifically designed
and well-tailored for composition assessment task.
Attentional Attribute Feature Fusion (AAFF): Built on row 2 (resp., row 5) in Table 1,
we additionally learn attribute feature and directly concatenate it with composition feature,
leading to row 8 (resp., row 9). The experimental results demonstrate that composition-
relevant attributes can help boost the performance of composition evaluation. This sheds
light on that composition-relevant attribute prediction and composition evaluation are two
related and reciprocal tasks. Finally, we complete our attentional attribute feature fusion
module by learning weights for weighted concatenation (row 10). From row 9 and row 10,
we can observe that the model using weighted concatenation is better than that using plain
concatenation, which validates the superiority of attentional fusion mechanism.
Figure 5: Analysis of the correlation between an image and its dominant pattern with the
largest weight. We show the estimated pattern weights and the largest weight is colored
green. We also show the ground-truth/predicted composition mean score in blue/red.
52] explicitly take composition into consideration. Since most of these methods do not yield
score distribution, we make a slight modification on the prediction layer of these methods to
be compatible with EMD loss [15]. For fair comparison, all methods are trained and tested
on our CADB dataset with ResNet18 pretrained on ImageNet [8] as backbone.
In Table 2, we compare our method with different composition-relevant aesthetic assess-
ment methods. The baseline model (ResNet18) only consists of the pretrained ResNet18 and
a prediction head, which is the same as row 1 in Table 1. Among these baselines, A-Lamp
is the most competitive one, probably because A-Lamp introduces additional saliency infor-
mation to learn the pairwise spatial relationship between objects. Our SAMP-Net clearly
outperforms all the composition-relevant baselines, which demonstrates that our method is
more adept at image composition assessment.
Figure 6: We show some failure cases in the test set, which have the highest absolute er-
rors between the predicted composition mean scores (out of bracket) and the ground-truth
composition mean scores (in bracket).
symmetrical axis under pattern 2. So the low score implies that maintaining horizontal sym-
metry may enhance the composition quality. In the left figure of the third row, per the low
score under pattern 5, the dog is suggested to be moved to the center. In summary, our SAMP
module can facilitate composition assessment by integrating the information from multiple
patterns and provide constructive suggestions for improving the composition quality.
5.6 Limitations
While our method can generally achieve accurate and reliable composition assessment, it
still has some failure cases. We show several failure cases in Figure 6, which have the
highest absolute errors between the predicted and ground-truth composition mean scores.
We can observe that our model tends to predict relatively low scores for these images with
high composition mean scores, which is probably due to the distracting backgrounds and
complicated composition patterns. In addition, there is a clear gap between our method and
human raters on ranking the composition quality of different images (see Supplementary),
which needs to be enhanced in the future work.
6 Conclusion
In this paper, we have contributed the first composition assessment dataset CADB with five
composition scores for each image. We have also proposed a novel method SAMP-Net with
saliency-augmented multi-pattern pooling. Equipped with SAMP module, AAFF module,
and weighted EMD loss, our method is capable of achieving the best performance for com-
position assessment.
Acknowledgement
This work is sponsored by National Natural Science Foundation of China (Grant No. 61902247)
and Shanghai Sailing Program (19YF1424400).
ZHANG, NIU, ZHANG: IMAGE COMPOSITION ASSESSMENT WITH SAMP 11
References
[1] S. Bhattacharya, R. Sukthankar, and M. Shah. A framework for photo-quality assess-
ment and enhancement based on visual aesthetics. In ACM-Multimedia, 2010.
[2] A. Brachmann and C. Redies. Computational and experimental approaches to visual
aesthetics. Frontiers in computational neuroscience, 11(1):102–119, 2017.
[3] K. Chang, KH. Lu, and CS. Chen. Aesthetic critiques generation for photos. In ICCV,
2017.
[4] Q. Chen, W. Zhang, N. Zhou, P. Lei, Y. Xu, Y. Zheng, and J. Fan. Adaptive fractional
dilated convolution network for image aesthetics assessment. In CVPR, 2020.
[5] Y. Chen, J. Klopp, M. Sun, S. Chien, and K. Ma. Learning to compose with professional
photographs on the web. In ACM-Multimedia, 2017.
[6] M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara. Predicting human eye fixations via
an LSTM-based saliency attentive model. IEEE Transactions on Image Processing, 27
(10):5142–5154, 2018.
[7] R. Datta, D. Joshi, J. Li, and J. Wang. Studying aesthetics in photographic images using
a computational approach. In ECCV, 2006.
[8] J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale
hierarchical image database. In CVPR, 2009.
[9] Y. Deng, C.C. Loy, and X. Tang. Image aesthetic assessment: An experimental survey.
IEEE Signal Processing Magazine, 34(4):80–106, 2017.
[10] S. Dhar, V. Ordonez, and T. Berg. High level describable attributes for predicting
aesthetics and interestingness. In CVPR, 2011.
[11] Yuming Fang, Hanwei Zhu, Yan Zeng, Kede Ma, and Zhou Wang. Perceptual quality
assessment of smartphone photography. In CVPR, 2020.
[12] M. Freeman. The photographer’s eye: Composition and design for better digital pho-
tos. CRC Press, 2007.
[13] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional
networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 37(9):1904–1916, 2015.
[14] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In
CVPR, 2016.
[15] L. Hou, C.P. Yu, and D. Samaras. Squared earth mover’s distance-based loss for train-
ing deep neural networks. ArXiv, abs/1611.05916, 2016.
[16] Q. Hou, MM. Cheng, X. Hu, A. Borji, Z. Tu, and P. Torr. Deeply supervised salient
object detection with short connections. In CVPR, 2017.
[17] X. Hou and L. Zhang. Saliency detection: A spectral residual approach. In CVPR,
2007.
12 ZHANG, NIU, ZHANG: IMAGE COMPOSITION ASSESSMENT WITH SAMP
[18] A. Jahanian, S. Vishwanathan, and J. Allebach. Learning visual balance from large-
scale datasets of aesthetically highly rated images. In Human Vision and Electronic
Imaging XX, 2015.
[19] X. Jin, L. Wu, G. Zhao, X. Li, X. Zhang, S. Ge, D. Zou, B. Zhou, and X. Zhou.
Aesthetic attributes assessment of images. In ACM-Multimedia, 2019.
[20] Dhiraj Joshi, Ritendra Datta, Elena Fedorovskaya, Quang-Tuan Luong, James Z Wang,
Jia Li, and Jiebo Luo. Aesthetics and emotions in images. IEEE Signal Processing
Magazine, 28(5):94–115, 2011.
[21] Keunsoo Ko, Jun-Tae Lee, and Chang-Su Kim. PAC-Net: Pairwise aesthetic compari-
son network for image aesthetic assessment. In ICIP, 2018.
[22] S. Kong, X. Shen, Z. Lin, R. Mech, and C. Fowlkes. Photo aesthetics ranking network
with attributes and content adaptation. In ECCV, 2016.
[23] J.T. Lee, H. Kim, C. Lee, and C. Kim. Semantic line detection and its applications. In
ICCV, 2017.
[24] J.T. Lee, H. Kim, C. Lee, and C. Kim. Photographic composition classification and
dominant geometric element detection for outdoor scenes. Journal of Visual Commu-
nication and Image Representation, 55(1):91–105, 2018.
[25] C. Li, A. Gallagher, A. Loui, and T. Chen. Aesthetic quality assessment of consumer
photos with faces. In ICIP, 2010.
[26] Leida Li, Hancheng Zhu, Sicheng Zhao, Guiguang Ding, and Weisi Lin. Personality-
assisted multi-task learning for generic and personalized image aesthetics assessment.
IEEE Transactions on Image Processing, 29(1):3898–3910, 2020.
[27] Xuewei Li, Xueming Li, Gang Zhang, and Xianlin Zhang. A novel feature fusion
method for computing image aesthetic quality. IEEE access, 8:63043–63054, 2020.
[28] D. Liu, R. Puri, N. Kamath, and S. Bhattacharya. Composition-aware image aesthetics
assessment. In WACV, 2020.
[29] Ligang Liu, Renjie Chen, Lior Wolf, and Daniel Cohen-Or. Optimizing photo compo-
sition. In Computer Graphics Forum, 2010.
[30] S. Lok, S. Feiner, and G. Ngai. Evaluation of visual balance for automated layout. In
Proceedings of the 9th international conference on Intelligent user interfaces, 2004.
[31] S. Ma, J. Liu, and C. Chen. A-Lamp: Adaptive layout-aware multi-patch deep convo-
lutional neural network for photo aesthetic assessment. In CVPR, 2017.
[32] L. Mai, H. Jin, and F. Liu. Composition-preserving deep photo aesthetics assessment.
In CVPR, 2016.
[33] L. Marchesotti, F. Perronnin, D. Larlus, and G. Csurka. Assessing the aesthetic quality
of photographs using generic image descriptors. In ICCV, 2011.
[34] B. Martinez and J. Block. Visual forces: an introduction to design. Pearson College
Division, 1995.
ZHANG, NIU, ZHANG: IMAGE COMPOSITION ASSESSMENT WITH SAMP 13
[35] N. Murray, L. Marchesotti, and F. Perronnin. AVA: A large-scale database for aesthetic
visual analysis. In CVPR, 2012.
[37] F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for image categoriza-
tion. In CVPR, 2007.
[39] Yogesh Singh Rawat and Mohan S Kankanhalli. Context-aware photography learning
for smart mobile devices. ACM Transactions on Multimedia Computing, Communica-
tions, and Applications, 12(1):1–24, 2015.
[40] Yogesh Singh Rawat and Mohan S Kankanhalli. Clicksmart: A context-aware view-
point recommendation system for mobile photography. IEEE Transactions on Circuits
and Systems for Video Technology, 27(1):149–158, 2016.
[41] Yogesh Singh Rawat, Mingli Song, and Mohan S Kankanhalli. A spring-electric graph
model for socialized group photography. IEEE Transactions on Multimedia, 20(3):
754–766, 2017.
[42] J. Ren, X. Shen, Z. Lin, R. Mech, and D. Foran. Personalized image aesthetics. In
ICCV, 2017.
[44] A. Savakis, S. Etz, and A. Loui. Evaluation of image appeal in consumer photography.
In Human vision and electronic imaging V, 2000.
[45] Katharina Schwarz, Patrick Wieschollek, and Hendrik PA Lensch. Will people like
your image? learning the aesthetic space. In WACV, 2018.
[46] H. Su, T. Chen, C. Kao, W. Hsu, and S. Chien. Scenic photo quality assessment with
bag of aesthetics-preserving features. In ACM-Multimedia, 2011.
[47] Yu-Chuan Su, Raviteja Vemulapalli, Ben Weiss, Chun-Te Chu, Philip Andrew Mans-
field, Lior Shapira, and Colvin Pitts. Camera view adjustment prediction for improving
image composition. arXiv preprint arXiv:2104.07608, 2021.
[48] H. Talebi and P. Milanfar. NIMA: Neural image assessment. IEEE Transactions on
Image Processing, 27(8):3998–4011, 2018.
[49] X. Tang, W. Luo, and X. Wang. Content-based photo quality assessment. IEEE Trans-
actions on Image Processing, 15(8):1930–1943, 2013.
[50] K. Thömmes and R. Hübner. Instagram likes for architectural photos can be predicted
by quantitative balance measures and curvature. Frontiers in psychology, 9(1):1050–
1067, 2018.
14 ZHANG, NIU, ZHANG: IMAGE COMPOSITION ASSESSMENT WITH SAMP
[51] Yi Tu, Li Niu, Weijie Zhao, Dawei Cheng, and Liqing Zhang. Image cropping with
composition and saliency aware aesthetic score map. In AAAI, 2020.
[52] W. Wang and R. Deng. Modeling human perception for image aesthetic assessment. In
ICIP, 2019.
[53] W. Wang, S. Yang, W. Zhang, and J. Zhang. Neural aesthetic image reviewer. IET
Computer Vision, 13(8):749–758, 2019.
[54] Min-Tzu Wu, Tse-Yu Pan, Wan-Lun Tsai, Hsu-Chan Kuo, and Min-Chun Hu. High-
level semantic photographic composition analysis and understanding with deep neural
networks. In ICMEW, 2017.
[55] Yaowen Wu, Christian Bauckhage, and Christian Thurau. The good, the bad, and the
ugly: Predicting aesthetic image labels. In ICPR, 2010.
[56] Donggeun Yoo, Sunggyun Park, Joon-Young Lee, and In So Kweon. Multi-scale pyra-
mid pooling for deep convolutional representation. In CVPRW, 2015.
[57] N. Yu, X. Shen, L. Lin, R. Mech, and C. Barnes. Learning to detect multiple photo-
graphic defects. In WACV, 2018.
[58] L. Zhang, Y. Gao, R. Zimmermann, Q. Tian, and X. Li. Fusion of multichannel local
and global structural cues for photo aesthetics evaluation. IEEE Transactions on Image
Processing, 23(3):1419–1429, 2014.
[59] T. Zhao and X. Wu. Pyramid feature attention network for saliency detection. In CVPR,
2019.
[60] Y. Zhou, X. Lu, J. Zhang, and J.Z. Wang. Joint image and text representation for
aesthetics analysis. In ACM-Multimedia, 2016.
[61] Z. Zhou, S. He, J. Li, and J.Z. Wang. Modeling perspective effects in photographic
composition. In ACM-Multimedia, 2015.