0% found this document useful (0 votes)
20 views

Vision and Audio-Based Methods For First Impression Recognition Using Machine Learning Algorithms: A Review

This document provides a review of vision- and audio-based methods for first impression recognition using machine learning algorithms. It discusses key areas in personality computing like automatic personality recognition, perception, and synthesis. Automatic personality perception, also known as apparent personality impressions or first impressions, is the focus. The review classifies existing literature based on data modalities and machine learning techniques used. It identifies limitations and suggests ways to address gaps to advance the field of first impressions recognition.

Uploaded by

Neerendra kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Vision and Audio-Based Methods For First Impression Recognition Using Machine Learning Algorithms: A Review

This document provides a review of vision- and audio-based methods for first impression recognition using machine learning algorithms. It discusses key areas in personality computing like automatic personality recognition, perception, and synthesis. Automatic personality perception, also known as apparent personality impressions or first impressions, is the focus. The review classifies existing literature based on data modalities and machine learning techniques used. It identifies limitations and suggests ways to address gaps to advance the field of first impressions recognition.

Uploaded by

Neerendra kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

March 27, 2023 22:24 IJAIT S0218213023400109 page 1

International Journal on Artificial Intelligence Tools


Vol. 32, No. 2 (2023) 2340010 (48 pages)
© World Scientific Publishing Company
DOI: 10.1142/S0218213023400109

Vision and Audio-based Methods for First Impression


Recognition Using Machine Learning Algorithms: A Review

Sumiya Mushtaq∗ , Neerendra Kumar† , Yashwant Singh‡


Department of Computer Science and IT
Central University of Jammu
Jammu, Jammu-Kashmir 181143, India
[email protected]
[email protected]
[email protected]

Pradeep Kumar Singh§


Narsee Monjee Institute of Management Studies
School of Technology Management & Engineering
Chandigarh 160014, India
pradeep [email protected]

Received 28 July 2021


Accepted 7 November 2021
Published 5 April 2023

Personality is a psychological construct that embodies the unique characteristics of an


individual. Automatic personality computing enables the assessment of personality el-
ements with the help of machines. Over the last few decades, a lot of researchers have
focussed on computing aspects of personality, emotions, and behavior with the help of
machine learning. Efficient personality recognition using machine learning can provide
inroads to almost every field of human advancement. However, we found out there are
not enough good surveys to consolidate the progress of the growing field. In our effort,
we have taken a sub-field of personality computation known as apparent personality
perception or First Impressions. In addition to providing an exhaustive compilation of
available First Impression research, we have classified the existing literature according
to the data modalities and machine learning techniques they have utilized. Our work
also lists various limitations and gaps within the existing literary works with possible
measures to address them. The paper concludes with our comments on the future work
in the field of first impressions.

Keywords: First impressions; human-computer interaction; personality computing; per-


sonality perception; social signals processing.

§ Corresponding author

2340010-1
March 26, 2023 8:43 IJAIT S0218213023400109 page 2
1st Reading

S. Mushtaq et al.

1. Introduction
Personality is the delitescent concept that exists in all of us explains distinctive pat-
terns of personal thoughts, emotions, and behaviors, as well as cognitive processes
that lie underneath these patterns, whether covert or not.1 It acclaims dispositions,
mentalities, and sentiments and is primarily expressed in connection with others.
It encompasses both natural and obtained behavioral attributes that distinguish
individuals from each other, also individuals’ relations with their surroundings and
social circle can be seen in this way. The study of the psychology of personality
endeavours to clarify the propensities that underlie contrasts in conduct. So, the
fundamental objective of personality psychology is to recognize individuals inward
qualities from obvious practices as well as to research the relevant connections
among the two.2
Personality computing is a field that uses advanced computerized procedures
to analyze personality from a variety of sources such as context, audiovisual aid,
and social network.3 It may be thought of as an augmentation or supplement of
affective computing with the prior concentrating on personality traits and the last
on affective states. However, it is no longer restricted to personality and influence
but applies to any computing field related to the interpretation, envision, and an
assortment of people’s conduct. Despite, being exclusive and numerous in terms
of information, technology, and practices, all computing domain names involved
with personality keep in mind the three identical thrust areas4 (Fig. 1), specifically
the acknowledgement of individuals genuine personality (Automatic Personality
Recognition), the expectation of the person personality based on what others assign
to them (Automatic Personality Perception), and the development of synthetic
identities using artificial agents (Automatic Personality Synthesis).

1.1. Key areas


Automatic Personality Recognition (APR) is the task for evaluating the target
individuals’ true personalities based on their virtual footprints. It targets inducing

Fig. 1. Key areas of personality computing.

2340010-2
March 26, 2023 8:43 IJAIT S0218213023400109 page 3
1st Reading

Vision and Audio-based Methods for First Impression Recognition

Fig. 2. Automatic personality perception.

characteristics coming about because of Self-evaluations, so generally viewed as


genuine attributes of an individual.
Automatic Personality Perception (APP) is the task of inferring the personality
assigned attributed to a target individual by an onlooker primarily acquired from
few measurable behaviors. It is the component of social cognition, which investi-
gates how humans assume, act, and the way we tend to process information from
our social world (Fig. 2). Recently many works used the terms apparent person-
ality impressions or first impressions to describe personality perception.5 A “First
Impression” is formed when a person meets someone for the first time and forms a
visual impression about them.6 First impressions are affected by various elements
like facial appearances, vocal inflections, body language, emotional states, etc. First
impressions affect a person’s life and rely upon the circumstance wherein their ap-
pearance is being appraised. As stated in psychology studies, the first perception
is created, even if the exposure time to the strange face is less than 100 ms.7 First
impressions are very important because people generally tend to get connected
to their preliminary impressions of others and find it troublesome to alter their
opinion, even if supplied with masses of proof to the contrary.
Automatic personality synthesis (APS) is the task of developing synthetic
personalities in Avatars and virtual specialists. It is the assignment of naturally
creating digital footprints pointed towards evoking the attribution of preferred per-
sonality attributes. The digital footprints are not generated with the aid of us-
ing humans, however with the aid of using any system able to show human-like
behaviors but the process of attribution includes human observers that attribute
personality traits to the system.

2340010-3
March 26, 2023 8:43 IJAIT S0218213023400109 page 4
1st Reading

S. Mushtaq et al.

1.2. Personality models


To compute the personality of individuals, different personality models are utilized.
In any case, the models that most successfully anticipate quantifiable perspectives
within the lifestyle of individuals are the ones based on traits.8 Personality traits
are personality factors that generally remain steady for a long period, contrast
between people, and are moderately predictable over circumstances. Trait-based
models are based on people’s perceptions of semantic comparability and connections
among descriptive words which individuals used to portray their selves as well
as one another.9 Different trait-based models were proposed and comprehensively
studied which include Big-Five,10 Big-two,11 etc. Among all Big-Five is the most
persuasive model in psychology and personality computing.12 The Big-Five is so
named because the model recommends that human personality can be estimated
along with five significant measurements, every one of which is autonomous and
unbiased from the others. The Big-Five portrays individuals based on their traits
on a spectrum and accordingly, is a considerably more substantial and proof-based
method for understanding personality. The five dimensions of personality in the
Big-Five model are as follows (i)–(v):
(i) Openness quantifies the degree to which an individual is inventive and in-
novative, instead of sensible and conventional. Openness includes traits like
being perceptive, imaginative, curious, ingenious, innovative, etc.
(ii) Conscientiousness is the degree of how purposeful, coordinated, non-
indulgent, and arranged a person is. It includes traits like being organized,
logical, thorough, authentic, etc.
(iii) Extraversion is a proportion of how amiable, communicative, and vivacious
an individual is. Extraversion includes traits like being energetic, confident,
chatty, outgoing, etc.
(iv) Agreeableness is the portion of a person’s inclinations concerning social
amiability. Agreeableness traits include being benevolent, loving, supportive,
trusting, sympathetic, etc.
(v) Neuroticism gauges the manners by which people respond to stress. Neuroti-
cism is also known as emotional stability. It includes traits like being tense,
moody, over-sensitive, unstable, nervous, etc.
To access the dimensions of the Big-Five model various questionnaires are used.
The most popular method for accomplishing this task is to use questionnaires
in which individuals evaluate their own actions on a Likert Scale.13 The NEO-
Personality-Inventory-Revised (NEO-PI-R, 240 entries),14 the NEO Five-Factor
Inventory (NEO-FFI, 60 entries),15 and the Big-Five Inventory (BFI, 44 entries)
are among the most common questionnaires used for this task. Short questionnaires
(5–10 items), were created by keeping just those entries only that better correspond
with the efficiency of entire instruments, making them a lot quicker to complete,16,17

2340010-4
March 26, 2023 8:43 IJAIT S0218213023400109 page 5
1st Reading

Vision and Audio-based Methods for First Impression Recognition

like the BFI-10 which is the simplified precise version of Big-Five inventory. Every
entry is involved in trait ranking by being related to a Likert Scale (ranging from
“Strongly disagree” to “Strongly agree”). In the case of APR first-person ques-
tionnaires are utilized which results in self-assessments so yield actual personality
of individuals whereas in the case of APP third-person questionnaires are utilized
which results in a personality that is based on other people’s impressions of a person
in a particular circumstance.

1.3. Applications
There are enormous applications of personality computing in the current day sce-
nario. As exploration advances in this field, better personality models with a lot
of precision and reliability will be discovered. Specifically, computing technologies
permit the preparation of massive quantities of behavioral records that may be
hard to examine with procedures generally applied in psychology. In this regard,
personality computing might assist with setting up connections among attributes
and behavior with adequacy up until this point. Some of the application areas can
be enumerated as (i)–(viii):

(i) Recommendation Systems: Personality is a crucial component that im-


pacts individuals’ behavior and hobbies. Individuals with specific person-
ality profiles can have comparable passions and values. The services and
products suggested to a particular person should have been empathetically
evaluated by other customers of comparable personality types. Incorporating
user traits into recommender frameworks upgrades proposal quality and user
involvement.18
(ii) Intelligent Personal Agents: Automated personal assistants like Siri,
Google Assistant, Cortana, Mycroft, and so on can be made to routinely
experience clients and based totally on their characteristics can make knowl-
edgeable guesses about their requirements and wishes. So, virtual agents can
be customized to show diverse identities according to the personality of the
client, for achieving better client gratification.19–21
(iii) Biomedical Engineering/Mental Health: Automatic personality detec-
tion can address mental wellbeing-related issues. Based on the personality of
an individual, the counselor can provide a higher recommendation, or suit-
able automatic guidance can be given.22,23 It has the potential to be utilized
as a suicide prevention tool. Also, electronic health data can be used to an-
alyze a person’s personality and, as a result, personalized health care can be
provided.
(iv) Recruitment Automation: Personality influences all components of a
person’s life, achievements, failures, including their reaction to circumstances
at the workplace. So, it is imperative to recognize individual qualities and

2340010-5
March 26, 2023 8:43 IJAIT S0218213023400109 page 6
1st Reading

S. Mushtaq et al.

match candidates to the roles and responsibilities that are good for them.
Personality evaluation shows the employer which open position the candi-
date is most appropriate for or if they should be hired.24–30
(v) Advertising Campaigns: Automatic personality computing can be uti-
lized as a rule by advertising campaigns to be more successful.31–33 It may be
beneficial in Social advancement where mental targeting could diminish the
expense of publicity drive. Likewise, it can also be used in political prognosti-
cation as a suggestion to policymakers for potential campaigns more practical
and focused.34–36
(vi) Automated Deception Detection: In present era of Internet-of-
Things,37–39 criminal activities are big threats for the society. Automatic
personality trait detection can assist in figuring out fake statements.40–43 It
also can assist in forensics in figuring out criminals. The authorities will be
able to limit the suspect pool if they are familiar with the personality char-
acteristics of those who were in the vicinity at the time of the incident.
(vii) Maintaining Personal Connections: Personal interactions are vital to
one’s physical and mental well-being. In today’s world, more people prefer
to interact digitally rather than in person. Social media has a significant role
to play in bringing users together and fostering relationships.44 On social
media platforms, people converse with strangers and create new acquain-
tances. Personality computing can help people sustain interpersonal relation-
ships on social media platforms. Numerous researches have been conducted
on detecting personality from social media contexts such as tweets,45,46
texts,46–48 and profile photos.49,50
(viii) Designing Specified Applications: Different individuals utilize distinctive
applications dependent on their requirements and inclinations.51,52 So person-
ality is connected with the utilization of computing technologies. Automatic
personality computing can emerge as an essential device for developing new
applications or introducing new functions to existing frameworks.

1.4. Motivation for the survey


As mentioned in Sec. 1.3, personality computing has a wide range of applications
which makes it a prominent and emerging topic. This field’s advancement can help
humans and machines communicate better. A lot of research has already been done
on personality computing, particularly in the area of automatic personality percep-
tion. In the field of automatic personality perception, researchers have developed
new and advanced strategies. One of the most challenging and time-consuming
tasks for researchers dealing with apparent personality analysis is understanding
the principles of personality computing and then learning advanced approaches in
personality computing. This task consumes a significant portion of their research
time. They then adopt an appropriate model that is suited for their particular
challenge. This draws our attention to conduct a survey on automatic personality

2340010-6
March 26, 2023 8:43 IJAIT S0218213023400109 page 7
1st Reading

Vision and Audio-based Methods for First Impression Recognition

perception and collect the most crucial techniques, as well as material for apparent
personality analysis. So that when the future researchers working in the field of
apparent personality analysis go through this paper they will be able to solve their
problem in a short amount of time without squandering time studying everything
from the fundamentals to advanced approaches.

1.5. Objectives of the survey


The main objectives of the study are as follows (i)–(iii):
(i) To study the field of personality computing thoroughly.
(ii) To dissect all the existing approaches of automatic personality perception.
(iii) To identify research challenges and gaps and recommend a viable solution.

1.6. Contribution of the survey


New strategies in the domain of evident character investigation are being developed
at a rapid pace. So, a comprehensive survey is required to assist future researchers
in solving the problem without squandering a lot of time in learning and com-
prehending the fundamental concepts from various internet sources, because this
paper will assist them in comprehending all of the needed strategies, from the most
fundamental to the most advanced. This paper’s goal is to provide a comprehen-
sive overview of all personality perception approaches, from basic to advanced. Our
paper is distinctive in that no other paper, to our knowledge, has sought to pave
the way for researchers studying personality perception like ours. The first survey
on personality computing was presented by Vinciarelli and Mohammadi,4 who con-
centrated on automatic personality recognition, perception, and synthesis from a
broad perspective. Then in 2018, Junior and his co-authors,53 presented a survey
on first impression recognition using vision-based predictive analytics. Now recently
Phan and Rauthmann,54 presented a survey on personality computing from a gen-
eral point of view including the benefits, drawbacks, hazards, and ramifications of
Personality computing. This study contributes to the research field in the following
ways (i)–(v):
(i) An up-to-date review of the literature on automatic personality perception
from both vision and audio-based perspectives is presented in this paper.
(ii) A taxonomy (Fig. 5) for categorizing works based on the data sources they
utilized i.e. audio, image sequences, still images, and multimodal data is sug-
gested and the challenges are discussed category wise and possible solutions
are prescribed. Moreover, the suggested taxonomy will assist researchers in
discovering common databases, properties, and categorizations.
(iii) A comparative analysis of many studies is presented so that future researchers
will understand the benefits and drawbacks of various approaches and what
methodologies should they compare or derive ideas from?

2340010-7
March 26, 2023 8:43 IJAIT S0218213023400109 page 8
1st Reading

S. Mushtaq et al.

(iv) Commonly used datasets and performance evaluation criteria are discussed
in order to advance research in this area and to assist future researchers in
choosing data that matches their goals.
(v) Open research challenges, gaps, and opportunities for further research in this
subject are identified.

1.7. Organization of the paper


This paper presents an exhaustive review on the field of personality computing with
the main focus on automatic personality perception.
The rest of the paper is organized as follows: Section 2 describes the survey
methodology. Section 3 presents the state-of-the-art of APP based on vision and
audio-based techniques with current issues and possible solutions. Section 4 provides
comparative analysis of the studies present in the research area. Section 5 discusses
challenges, gaps, and future directions. Section 6 covers the general steps for first
impression recognition. Finally, the paper ends with a conclusion and future scope
in Sec. 7.

2. Methodology of the Paper


This section discusses the survey methodology. This study is conducted using the
methodology presented in Fig. 3.

2.1. Protocol for the survey


This study searched a variety of electronic sources for relevant papers before filtering
the results using relevant inclusion and exclusion criteria. Then depending on the
survey’s objectives, appropriate works were selected and an analysis was conducted
and presented.

2.2. Data sources


To find relevant research papers, the study used the IEEE Xplore, Springer, ACM
Digital Library, Google Scholar, Science Direct, and the arXiv database.

Fig. 3. Steps of the survey methodology.

2340010-8
March 26, 2023 8:43 IJAIT S0218213023400109 page 9
1st Reading

Vision and Audio-based Methods for First Impression Recognition

Fig. 4. Flowchart for relevant article inclusion and exclusion criteria.

2.3. Criteria for inclusion and exclusion


The next step was to find the publications that were most relevant to this survey
using inclusion and exclusion criteria (Fig. 4). To find the most relevant research
articles, a comprehensive keyword-based search (as in Table 1) was carried out
utilizing multiple search questions. In this study, we used inclusion and exclusion
criteria to decide whether a study should be included or not. We include articles on
personality (personality traits, models, and so on) and personality computing in our
introductory Section 1 (its key areas, applications). However, in our literature study,
we only included articles (from 2011 to 2021) that were related to first impressions
and automatic personality perception, and we classified them according to the data
modalities used.

2340010-9
March 26, 2023 8:43 IJAIT S0218213023400109 page 10
1st Reading

S. Mushtaq et al.

Table 1. Key words for finding relevant articles.

Personality + Personality Personality Personality Personality +


Definition/ Models Computing + Computing + Emotions
Psychological Key Areas + Text/Speech/
Concepts Its Applications Social Media/
Images/Video

Apparent Automatic First First First


Personality Personality Impression Impressions/ Impressions/
Analysis Perception Recognition APP from APP + Machine/
(APP) Images/Audio/ Deep Learning
Social Media/Video/
Multimodal Data

Inclusion Criteria: For inclusion, we first read the title of the article to see if
it is relevant, then the abstract to see if it is also relevant, then the keywords to
see if they are also relevant, and finally we read the entire article to see if it is
relevant, and if it is, we include it. We also include articles that presented novel
methodologies for first impression recognition/automatic personality perception or
focuses on both APP and APR.
Exclusion Criteria: Exclude all those articles that do not meet the above-
mentioned inclusion criteria.

3. Related Works
The study has divided the literature review into sections using topology for grouping
works according to the data modalities they utilized, such as static images, image
sequences, audio, and multi-modal data (Fig. 5).

Fig. 5. Taxonomy for classifying works as per data modalities.

2340010-10
March 26, 2023 8:43 IJAIT S0218213023400109 page 11
1st Reading

Vision and Audio-based Methods for First Impression Recognition

3.1. Automatic personality perception using images


This section covers the works using images for automatic personality perception.
Images may be either still images or sequence images.55 Still, images primarily
concentrate on facial appearances while image sequences describe temporal details
along with dynamics discarding audio modality here. The human face is an essential
mechanism of human conversation and an outstanding source of facts used to infer a
range of attributes. When people have a glance at someone’s face, they often develop
an implicit unconscious perception of them, which is extremely beneficial data in
everyday interactions. Investigating the connections between visual characteristics
and personality is an exciting and difficult problem in audiovisual aid. Few works
focused on analyzing the connection between visual features and personality traits.
Some of them have taken still images as the input while others use image sequences
to incorporate temporal information.

3.1.1. Still images based personality perception


In 2011, Mario et al.,56 conducted their experiments over Karolinska amateur data
set and over synthetic dataset to predict judgement of dominance based on facial
information. Two descriptors were used Probabilistic Appearance Descriptor (PA)
and Histogram of Oriented Gradients (HOG) and three classification techniques
(Gentle Boost, SVM with Radial Kernel, KNN) were utilized to assess the efficiency
of descriptors in the task of dominance judgement prediction. The results showed
that machine learning approaches can accurately predict dominance judgements
with a high degree of accuracy (up to 90%) and that the HOG descriptor can
properly represent the details needed for such a purpose.
In 2014, Al Moubayed et al.,57 performed experiments over frontal color images
of the Color FERET database to map facial appearances into Big-Five personality
traits. For this purpose, they connected eigenfaces features with SVM and the
analysis revealed that an individual depicted in the picture can be labelled above
or below median with about 65% accuracy depending on the specific trait.
In 2015, Guntuku et al.,58 employed low-level visual features of selfie photo-
graphs and provided a unique strategy for building mid-level cue detectors for
detecting both genuine and apparent personality traits, and LibSVM with RBF
Kernel was used to train mid-level detectors. They built their dataset by download-
ing pictures from 612 Sina Weibo users (China’s version of Twitter). Even though
the dataset was small (123 users) it was discovered that mid-level cue detectors
surpassed most visual features in predicting personality traits of selfie users. Also,
the correlation between various mid-level cues and personality factors is analyzed
to find the significance of mid-level cues. In the same year, Joo et al.,59 introduced a
revolutionary method for automatically predicting an individual’s personality traits
based on their facial features and then used it to analyze and predict the results
of political elections. They performed their experiments over a unique dataset of
facial pictures of politicians in the United States. For this purpose, they proposed

2340010-11
March 26, 2023 8:43 IJAIT S0218213023400109 page 12
1st Reading

S. Mushtaq et al.

a Hierarchical model. The social dimensions, which describe the relative ranking
orders among the samples, were predicted using Ranking SVMs (Rank SVMs). The
results showed that the model correctly predicts election outcomes with an accu-
racy of 67.5% for Governors and 65.5% for Senators and it also properly classifies
politicians’ political party affiliations, such as Democrats versus Republicans, with
62.6% accuracy for males and 60.1% accuracy for females.
In 2016, Yan et al.,60 performed experiments to verify the connections between
facial appearance and personality impression namely trustworthiness. They con-
ducted their experiments over a subset of the LFW dataset containing 2010 por-
traits annotated by 250 volunteers. They used face local descriptors to extract
low-level characteristics from various face areas. After that, they used the k-means
algorithm to cluster local descriptors into a set of mid-level features to mitigate
the semantic-based gap between high-level and low-level features. Then, they uti-
lized SVM to figure out the connections between facial features and personality
impressions. The results showed that the proposed framework outperforms state-
of-the-art efforts when compared to them.In the same year, Lewenberg et al.,61 pre-
sented an advanced technique for estimating both objective and subjective traits
using a Landmark augmented Convolution Neural network (LACNN). They per-
formed their experiments over a new Face Attributes Dataset (a subset of PubFig
Dataset. The results showed that LACNN brings consistent improvement indicates
that the LACNN improved classification performance for both objective as well as
subjective traits and generates responses with more detailed information as com-
pared to non-augmented baseline. In the same year, Dhall and Hoey,62 suggested a
multivariate method for inferring users’ personality impressions from Twitter pro-
file photos. The relationship between Big-Five personality factors derived through
user Tweet analysis and the suggested photo-based profile framework is evaluated.
In their work, they considered Face as well as background. Both Hand-Crafted
(Face-HOG, Face LPQ, Scene-CENTRIST) and deep learning (Face-VGG, Scene-
ImageNet) features were computed from face region as well as background on users
profile photos. Kernel partial least square method was used for inferring Big-Five
traits. The results showed that scene descriptors had a high correlation with the
Big-Five traits features derived from user analysis of tweets (openness). Also, when
compared to Hand-crafted features, deep learning features mixed with KPLS re-
gression perform the best.
In 2017, McCurrie et al.,63 performed experiments over a unique ground truth-
free dataset that included 6300 grayscale facial pictures from the AFLW dataset
that were labeled with four attributes (Dominance, trustworthiness, age, and IQ)
to develop efficient automatic predictors for social attribute assignment. They used
an online psychophysics testing platform (TestMyBrain.org), which allowed them
to collect data from a larger number of raters from a potentially more reliable and
geographically variable source. To train predictive models CNN regression frame-
work is used and five deep architectures (modified VGGNet19, modified VGGNet16,
MOON, and two shallower customized architectures with few convolutional layers

2340010-12
March 26, 2023 8:43 IJAIT S0218213023400109 page 13
1st Reading

Vision and Audio-based Methods for First Impression Recognition

and different regularization levels) were compared. Out of all MOON (mixed op-
timizations network), architecture slightly performs better so used as a base for
effective final models. The results demonstrated that the models have a significant
correlation with human crowd ratings and indicate deep architectures contribute
little to no improvement. In the same year, Zhang et al.,64 analyzed the correlation
between personality and visual characteristics (color, expressions) using statisti-
cal approaches. They performed their experiments on 2000 images collected from
Google, Bingo, Baidu marked by 54 people in terms of Big-Five traits. For color
features, the HSV color space model is used so they calculate average saturation,
average brightness, and hue in HSV space. They used Microsoft Emotion API for
the automatic extraction of three types of expressions (positive, negative, and 10
neutral) from a person’s image. The results showed that different features have
varying degrees of correlation with personality and certain features like happiness,
neutrality, and saturation influence personality directly.
Current Issues: The majority of works that use still photos as input data are
primarily concerned with facial characteristics.56–58 Analyzing the relationship be-
tween facial features and personality traits is fraught with difficulties like there are
numerous distinct facial characteristics, many of which are difficult to identify. Dis-
crete facial traits have weak impacts that become statistically significant only when
large samples are used. The works using realistic images for personality prediction
may face more problems because the basic features of an image are dependent
upon the subject’s choice which in turn is personality-dependent because one of
the primary reasons why people take and share images is to communicate their
personality to others. Isolating each variable’s influence from the plethora of other
factors appears to be a challenging task. Context is usually ignored in the case of
still images. As stated in Ref. 65, context plays a crucial role in emotion perception
using images, which is perfectly consistent with the research of APP. Although some
studies suggest that background information such as haircuts or apparel should be
ignored,66,67 people can use such information to actively influence how others per-
ceive them. Also, the unpredictability of human raters’ evaluations of personality
traits is an additional hurdle for studies attempting to uncover the face-personality
relationships. So to generate credible estimations of personality qualities for each
image, a large number of human raters’ are required.
Possible Solutions: When the attention is on the face as well as its surrounding
regions, the issues associated with using still images as input data for personality
prediction can be overcome to some extent. The works in Ref. 62, concentrated on
image regions outside the facial area, but the topic needs to be explored further.
Analyzing body language68 is a hot topic in computer vision, may enhance apparent
personality characteristic analysis. So, including a variety of non-verbal signs like
hand position, smile type, gaze direction, movement of the head, body motion, mor-
phological characteristics of the face can improve the efficiency of works using still
images because these are crucial indicators of a person’s emotional and cognitive

2340010-13
March 28, 2023 21:44 IJAIT S0218213023400109 page 14

S. Mushtaq et al.

inner state. Furthermore, research demonstrates that data labeling annotators are
influenced by context. Thus including context can advance research using still im-
ages. Also, current studies have attempted to take a more holistic approach to the
subject, looking into people’s subjective perceptions of their personalities based on
integral facial images.69

3.1.2. Personality perception using image sequences


In 2012, Biel et al.,70 performed experiments over the subset of the YouTube vlog
dataset to detect emotional facial expressions frame-by-frame and for Big-Five trait
prediction, they used Support Vector Regressors (SVR). The approach showed good
results especially for predicting the extraversion trait. In 2013, Aran and Gatica-
Perez,71 explored the utilization of content from social media to transfer knowledge
from conversation videos to a small group setting in order to predict the Extraver-
sion trait. In their work, they used the YouTube vlog dataset as the source domain
and data from small group meetings (102 individuals) as target domain and con-
sidered the special characteristics of extraversion trait and extracted statistics from
the Weighted Motion energy image and for classification used ridge regression and
SVM classifiers and combine them. The results showed that accuracy of 70% is
achieved when using a combination of ridge regression and SVM and indicates that
the data source is suitable for building personality impression models, that can be
used in realistic situations like meetings.
In 2014, Çeliktutan and Gunes,72 proposed a novel approach to handle the
difficult task of predicting perceived traits and social dimensions continuously in
time and space. They performed their works over SEMAINE corpus and annotators
were required to provide continuous ratings to video clips along multiple dimensions
(big five and three more dimensions). They used low-level visual features combined
with linear regression to solve the inference problem and the results proved the
feasibility of the proposed approach and continuously obtained annotations in space
and time showed that certain dimensions appear to be more static and stable than
others, while others appear to be more dynamic.
Later in 2015, Mosquera et al.,73 performed experiments to investigate if there
were any links between emotion-based facial expressions and Big-Five personality
traits that could be automatically accessed as an extension of Ref. 70. To extract
facial expressions, CERT (Computer Expression Recognition Toolbox) was em-
ployed and four different cue extraction and their fusion strategies were utilized
to characterize face statistics and dynamics compactly. Co-occurrence evaluation
was additionally exploited to characterize the vlogs’ cue content. Then automatic
personality perception was addressed using an SVM regressor and results showed
that many Big-Five model attributes have a significant connection with various
facial expression cues, however, they can only accurately predict the Extraversion
trait.

2340010-14
March 26, 2023 8:43 IJAIT S0218213023400109 page 15
1st Reading

Vision and Audio-based Methods for First Impression Recognition

In 2016, Gürpinar and team,74 utilized perceptual cues (facial expressions and
scene) for apparent personality analysis on ChaLearn First impression dataset.
Then Kernel Extreme Learning Machine regressor is utilized which accepts a com-
bination of scene and facial expressions as input. The proposed framework achieved
an accuracy of 90.94% in ChaLearn First Impression ECCV Challenge.
In 2017, Ventura et al.,75 carried out experiments over ChaLearn Lap 2016
dataset to investigate why CNN models are so adept at recognizing first impres-
sions automatically. The results revealed that the majority of the discriminative
information required to infer personality traits is found in the face and CNN’s
internal representation specifically analyzes important face regions such as mouth,
nose, and eyes. Finally, they analyze the role of Action Units (AUs) in inferring per-
sonality traits and demonstrate how certain AUs influence facial trait judgments.
In the same year, Bekhouche et al.,76 introduced a unique strategy for Big-Five
trait estimation as well as for assessing features of job candidates from videos
of their faces. They carried out their experiments over the ChaLearn LAP 2016
database. In their work, they extracted texture features from face regions using
LBQ and BSIF descriptors and their fusion strategies and after that, the Pyramid
Multi-Level (PML) texture features were used to represent the results of these two
descriptors. Then to estimate the score of Big-Five traits, these PML features were
fed to five support vector regressors (SVR), one SVR per personality trait. The
output of these SVRs is then fed into Gaussian Process Regression (GPR) and ex-
periments achieved good results as claimed by authors, despite the fact that deep
learning-based algorithms can yield excellent performance, temporal facial texture
methodologies are still useful.
In 2020, Armendáriz et al.,77 performed experiments on New Portrait Personal-
ity Dataset (modified ChaLearn first impression dataset) to predict the personality
of individuals from their portraits. Five different variations of the dataset were uti-
lized. Several CNN models were utilized to extract features from portraits that are
personality trait indicators. To access the dataset’s performance, two Pilot models
were utilized. Both regression and classification tasks were used performed using
two regressor models and two classification models. To boost the performance of
the proposed models, transfer learning and feature encoding were investigated. Au-
toencoder was used to enhance feature extraction it used images from the Selfie
dataset (which contains 46 836 selfie photos tagged with 36 attributes grouped into
numerous categories such as hair, shape, and gender) and allows the network to
learn various patterns observed in selfies to display them in a consolidated manner.
To transfer face detection expertise to the proposed model, two FaceNet models
were utilized. The outcomes revealed that the proposed framework requires a sin-
gle portrait for assessing personality traits, feature extraction is automatic, and
both tasks that are extraction of features as well as personality trait classification
are performed by a single model and compared to human judgment greater per-
formance and accuracy is achieved in predicting 4 of 5 traits of Big-Five model.

2340010-15
March 28, 2023 17:12 IJAIT S0218213023400109 page 16

S. Mushtaq et al.

In the same year, Suen et al.,78 presented AVI-AI, an asynchronous video inter-
view (AVI) framework using artificial intelligence for decision making which is a
TensorFlow-based CNN model, to anticipate job applicant’s communicative abili-
ties and personality attributes automatically. For this purpose, they recruited 114
individuals (57 evaluators, 57 respondents) and their findings revealed that the
proposed model accurately predicts an applicant’s communicative abilities as well
as their Big-Five personality traits (except extraversion and conscientiousness), as
evaluated by the study’s experienced human evaluators.
In 2021, Song et al.,79 conducted automatic personality profiling tests on real
(VHQ) and perceived personality (first impression) datasets. They used a semi-
supervised learning approach and introduced a new rank loss and domain adapta-
tion approach for this, and then placed a convolution layer in every skip layer of
the trained Domain facial neural network (DFNN) that is trained for each indi-
vidual uniquely. As a result, the learned weights adjust to the associated person’s
facial behavior, and they suggested that these weights be used as a person-specific
descriptor, with encouraging outcomes. They also demonstrated that the actions
performed by the person in the video are important, and that the blending of mul-
tiple tasks achieves the maximum level of accuracy and the dynamics of multiple
scales are more illuminating than those of a single one.
Current Issues: With the advent of video modality, it is now possible to add tem-
poral information to the extensively utilized spatial parameters. When compared
to still photos, spatial-temporal information took first impression analysis to a new
depth, opening up a wider number of possibilities. But predicting personality from
Spatio-temporal turned out to be a study area that has not been fully explored.
This could be because of the difficult and time-consuming task of creating accu-
rate labels for a large amount of data over time especially when considering deep
learning-based methodologies, which have not yet been used in this area to our
knowledge. Continuous predictions in first impressions were explored by Çeliktutan
and Gunes.72 But their continuous prediction could be viewed as a frame-based
regressor, with each frame considered separately. As a result, the scene’s dynamics
are not fully explored. Works in Refs. 74 and 76 use statistics generated for the
frame sequence to represent short video clips globally. Despite the fact that such
a technique does not analyze the frames individually, it nevertheless ignores the
data’s temporal progression. The work in Ref. 78, is a good step towards develop-
ing automatic interview agents for the recruitment process, but the results are not
as good as they could be because of the small dataset and the limited number of
features used.
The work in Ref. 49, describes facial expression output statistics as a dynamic
signal observed over short periods of time. It might be seen as an advanced step
in terms of analyzing first impressions on a temporal scale dynamically. Using rel-
atively simple motion pattern representations71,78 temporal information has also
been exploited to improve results. But motion-based techniques are still in their

2340010-16
March 26, 2023 8:43 IJAIT S0218213023400109 page 17
1st Reading

Vision and Audio-based Methods for First Impression Recognition

infancy. Only a few works, included in Sec. 3.3 suggested simulating Spatio-temporal
information by employing more complex and dynamic methodologies like LSTM,80
or 3D-CNN.80 As a result, there are still unanswered questions about the topic’s
influence (in data labeling or prediction), advantages, and effective temporal infor-
mation modeling techniques. Also in most works personality is predicted from short
video clips, which is a challenging task. The literature revealed that characteristics
assessed at the start of each video clip are better predictors of viewers’ impressions,
validating the hypothesis that first impressions are formed through brief encoun-
ters.7 However, more research is required to corroborate the authors’ hypothesis,
to check if the same effect occurs when diverse nonverbal sources are utilized, and
whether each data type’s ideal slice duration and position are the same.
Possible Solutions: According to this work, the number of techniques generated
for image sequences is significantly lesser when compared to the multimodal cat-
egory (audiovisual (Sec. 3.3)). It could be due to the advantages of adding more
data or the numerous disciplines being examined. Only a few of the works discussed
in the above Section like Refs. 72 and 74 had expanded their works by including
both audio and visual data81,82 which focuses on the advantages of incorporating
acoustic characteristics into the framework. Despite this, temporal data is not fully
exploited by the majority of works. Future research should include a feature repre-
sentation that could keep track of the data’s temporal evolution, such as Dynamic
Image Networks,83 utilized in action recognition. As a result, additional work in
this subject is needed to overcome the issues of image sequences by fully using
spatio-temporal data. The framework’s efficiency can be improved by including
audiovisual data as well as advanced frameworks.

3.2. Automatic personality perception using speech


This section reviews the works using speech for automatic personality perception.
Psychological research has proven that we spontaneously and unconsciously at-
tribute character characteristics to unfamiliar people every time whenever we hear
their voices. Many studies have focussed just on non-verbal cues of the speaker
because the personality of speakers is assessed by listening to short speech clips so
we cannot properly focus on verbal context and if verbal content is considered we
may assign the same attributes to the speakers speaking the same context.
In 2011, Mohammadi et al.,84 performed experiments to confirm if automatic
audio processing and human perception are two different feature vectors or not.
They conducted their experiments over the SSPNET corpus. Pitch, first two for-
mants, energy, and voiced and unvoiced segment lengths were extracted to distin-
guish between professional and nonprofessional speakers. Three experiments were
performed for this task. The first experiment included personality assessments
only and achieved an accuracy of 75.5%, the second experiment included prosodic
features only and achieved an accuracy of 87.2%, and the third experiment included
a combination of both personality assessment and prosodic features, which achieved

2340010-17
March 26, 2023 8:43 IJAIT S0218213023400109 page 18
1st Reading

S. Mushtaq et al.

an accuracy of 90% showed that human assessments and automatically extracted


audio features are two diverse feature sets and personality traits when combined
with automatically extracted prosodic features, lead to statistically significant per-
formance improvement. In 2012, the same researcher,85 proposed a prosody-based
method for automatic interpretation of personality. For this purpose statistics (max,
min, max, entropy) of the pitch, first two formats, energy, and voiced and un-voiced
segments lengths were used. Logistic regression and support vector machine classi-
fiers were used to predict whether a speaker was above or below median concerning
Big-Five traits. Accuracy of 60% to 72% was achieved depending upon a particular
trait, and accuracy was higher for extroversion and conscientiousness traits when
compared to other traits. In the same year, they further performed experiments,86
to rank people according to Big-Five traits. Intonation and voice quality features
were extracted for this work and ordinal regression techniques were used to predict
the mutual position of two individuals and an accuracy of 80% was accomplished.
In the same year, Valente et al.,87 conducted experiments on 32 meetings extracted
from AMI Corpus inclusive of 128 subjects. Speech activity, prosodic features, N-
grams of 0 words, dialogue-act, and interaction features were extracted to deter-
mine whether a subject’s Big-Five traits are considered above or below the median,
multi-class boosting (implemented using Boostexter) were used for classification
and accuracy of 50% to 74.5% were achieved depending upon a particular trait. In
2013, Lui et al.,88 proposed an Artificial neural network-based methodology for the
task of automatic personality perception. They performed their experiments over
the “Berlin Database of Emotional Speech” containing 535 speech clips recorded by
10 actors annotated by 3 assessors. They extracted acoustic-prosodic features from
speech utilizing the OpenSMILE toolkit and fed these features to ANN which gen-
erated a BFI score. Then using BFI scores obtained from ANN, personality traits
are predicted. They utilized both BFI versions (full and short versions), to access
overall performance. The results showed that the proposed framework accomplished
a prediction accuracy of 70%.
In 2017, Jothilakshmi et al.,89 performed experiments over the SSPNET person-
ality corpus and used spectral features to analyze the connections between speech
and personality. The Frequency Domain Linear Prediction and Mel Frequency
Cepstral Coefficients were extracted for predicting personality and instance-based
KNN and SVM classifiers were used for classification. The results confirmed that
KNN outperforms the SVM classifier and classification accuracy between 90% to
100% is achieved. It is the extension of Ref. 90, where only the frequency domain
linear prediction feature technique was used and an accuracy of 90% to 99% was
achieved in predicting speaker personality traits. In the same year, to cope with the
small size of children’s speech corpora for evaluating children’s personality, Solera-
Ureña et al.,91 suggested a semi-supervised learning method. In their work, they
set up an experiment that included age and language mismatches. and utilized two
training datasets, a small labeled SSPNET corpus of French adult speech tagged

2340010-18
March 26, 2023 8:43 IJAIT S0218213023400109 page 19
1st Reading

Vision and Audio-based Methods for First Impression Recognition

with Big-Five traits, which was used to train binary classifiers (SVM) for each
Big-Five trait and another large unlabelled dataset CNG corpus containing Euro-
pean Portuguese children speech which was used to iteratively refine initial models
in semi-supervised learning. And as a test set Game-of-Nines dataset was used to
concentrate on how to determine the Big-Five personality traits of Portuguese chil-
dren using personality models derived from a speech corpus of French adults and
evaluate the overall efficiency of their proposed semi-supervised learning strategy.
They used various feature sets including baseline features from the sub-challenge,92
eGeMAPS features, and knowledge-based features. Despite the enormous bungle
caused by language and age differences, results showed that it is feasible to achieve
enhancements utilizing a semi-supervised approach and knowledge-based features
throughout age and language. It is the extension of the work performed in Ref. 93,
where experiments were performed over SSPNET (training data) and Game-of-
Nines (testing data) corpus to analyze the utilization of a heterogeneous speech
corpus for automatic personality prediction using the Big-Five model and the re-
sults revealed that there exists a consistent collection of acoustic-prosodic features
for Extraversion and Agreeableness features in both adult and child speech, provid-
ing information on how to detect personality traits in spontaneous speech across
age and language. In the same year, Su et al.,94 performed experiments over SSP-
NET speakers to automatically predict personality from speech signals utilizing
wavelet-based multi-resolution technique and convolution neural networks. In their
work, they firstly utilized wavelet transformation to broke down speech signals into
signals at a different level of resolution. Then at each resolution, acoustic features
were extracted using the OpenSMILE toolkit, after that these features are fed to
CNN which generates BFI-10 profiles. Finally, the perception trait is determined by
feeding these BFI-10 profiles to five artificial neural networks (ANN), each for one
of the Big-Five traits. The results showed that an average accuracy of 71.97% was
acquired utilizing this method outperforming the baseline method in the INTER-
SPEECH 2012 speaker trait sub-challenge and traditional ANN-based method.
In 2018, Gilpin et al.,95 performed experiments to predict whether a speaker’s
Big-Five traits are in low or high ranges utilizing a moderately limited amount
of data and features utilizing SVM and HMM classifiers. Firstly, they construct
SVM and HMM classifiers for automatic personality prediction using the SSPNET
speech corpus. Then they determine the correlation between features and speaker
subgroups and finally, they assessed the performance of SVM on the User Study
Dataset containing 15 speech clips record by three unique speakers annotated by
12 assessors. The results confirmed that 3 out of 5 traits are assessed with high
accuracies using an SVM classifier regardless of whether trained on certain features
and the dataset is quite small. In the same year, Zhu et al.,96 presented a novel
skip-frame LSTM method for predicting speakers’ personalities from the speech in
Mandarin. They performed experiments on the Mandarin subset of the BIT multi-
language corpus. In their work, they categorized each of the Big-Five traits into

2340010-19
March 26, 2023 8:43 IJAIT S0218213023400109 page 20
1st Reading

S. Mushtaq et al.

six sub-traits, resulting in a total of 30 traits., and as a benchmark, they used the
conventional SVM method with standard prosodic feature extraction. The results
revealed that the Extraversion trait is the easiest to predict like in French corpus
yet high accuracy of Agreeableness is as opposed to French corpus which may be
because of different cultural backgrounds, Also relationship among traits and sub-
traits is investigated and results showed that Big-Five traits are easier to predict
than their sub-traits because they are depictions in higher resolution. Finally, skip-
frame LSTM outperforms the traditional SVM framework corpus even on a small
corpus because it can use low-level features for direct inference of personality traits.
In 2020, Koutsombogera et al.,97 performed experiments over MULTISIMO cor-
pus for automatic recognition and perception of personality of individuals from their
dialogue. They analyzed participants’ speech and transcripts for extracting audio
and linguistic characteristics and used both self and informant-assessed reports.
They utilized different models like AdaBoost, Naive Bayes, Logistic Regression, and
Random Forest for this task and analyzed their performance using both reports.
The results showed that context is less important than acoustics because, models
based solely on acoustic features, or models that combine acoustic and linguistic
features, outperform models based solely on linguistic features, and there is no
preference for a particular model or the feature set for both personality recognition
and perception as various models outperform each other for various traits. In the
same year Liu et al.,98 presented an approach that integrates a log-likelihood based
Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) method for
grouping annotations with a technique for extracting crucial audio cues(YAAFE)
to improve accuracy while reducing the complexity of conventional speaker predic-
tion algorithms. They evaluated their approach on the SSPNET dataset, and the
findings revealed that it outperformed baseline methods.
Recently in this year (2021), Zaferani and his team,99 presented an advanced
approach for automatic extraction of features to characterize the well-known Big-
Five dimensions of personality. They conducted their research on the SSPNET cor-
pus and used data augmentation techniques for increasing the size of the dataset
then, to boost classification results, an asymmetric auto-encoder was used to ex-
tract important features automatically. The results revealed that the proposed sys-
tem brings significant classification improvements when compared with traditional
stacked auto-encoder, CNN, and other published works.

3.2.1. Current issues


As evident from literature, a speaker’s personality traits may be automatically as-
sessed from their speech.84,86,100 According to the study on first impressions “People
create first impressions from a glance for only 100 ms”.101 However, this logic does
not apply to audio modality, particularly for very brief audio samples. The majority
of the works presented in Sec. 2.3 are done on the SSPNET dataset,84,94,95,100 which
does not allow for much improvement. The datasets employed in the research for

2340010-20
March 26, 2023 8:43 IJAIT S0218213023400109 page 21
1st Reading

Vision and Audio-based Methods for First Impression Recognition

audio modality are small, making it difficult to fully exploit techniques. The major-
ity of works are based on machine learning approaches84–86 which is still a manual
process. The extraction of features is done manually, which takes time and is a
hectic job. In Ref. 96, regardless of the small dataset, deep learning algorithms are
applied, however, the results are not consistently improved, and training the LSTM
network is tough.

3.2.2. Possible solution


The problems of audio modality can be addressed by incorporating new large
datasets and applying deep learning to audio modality. Small datasets can be han-
dled with strategies like data augmentation, and classification accuracy can be im-
proved with approaches like utilizing an asymmetric auto-encoder.99 The majority
of Sec. 3.2 studies solely evaluate the nonverbal content of speech, because judging
personality from short audio snippets is impractical. However, as demonstrated in
Sec. 3.3, efficiency can be increased by combining verbal and nonverbal content. As
a result, employing advanced deep learning algorithms as well as verbal and non-
verbal clues to analyze a speaker’s personality can enhance efficiency and progress
study in this field.

3.3. Automatic personality perception using multimodal data


This section analyzes the topics using multimodal data for automatic personality
perception. Multimodal data may contain audio and visual data, or as well as text
cues and other information.
In 2011, Staiano et al.,102 performed experiments over 4 meetings extracted from
the Mission Survival corpus to determine whether a subject’s Big-Five traits are
below or above the median. Prosodic, speaking activity, and social attention features
were used and Naive Bayes, HMM, and SVM classifiers were utilized to check the
efficiency of extracted features in demonstrating the personality states dynamics,
and accuracies in the range of 60% and 75% were accomplished depending upon
different traits.
In 2013, Biel et al.,103 performed experiments over the YouTube vlog dataset
and explored the significance of verbal content in personality prediction using LIWC
and N-gram analysis and treated inference as regression task performed using SVM
(linear, RBF kernel) and Random forest. The goal of their experiment was what
bloggers say, several works combining verbal and non-verbal cues have previously
been performed. They also explored the feasibility of using automatic speech recog-
nition (ASR) for building fully automatic frameworks. The analysis revealed that
error-free verbal content can detect four of Big-Five traits, three of which are
better than utilizing non-verbal features, and the ASR system’s errors reduce overall
efficiency.
In 2014, Sarkar et al.,104 conducted experiments over a YouTube vlog dataset
and exploits the problem of apparent personality trait classification by combining

2340010-21
March 26, 2023 8:43 IJAIT S0218213023400109 page 22
1st Reading

S. Mushtaq et al.

audiovisual, bag of words, demographic, and sentiment features and performed clas-
sification by using a logistic regression model. Results showed that the significance
of feature sets varies for different traits. In the same year, Alam and Riccardi,105
came with the same results when addressed personality impressions using the same
dataset and was the winner of the WCPR challenge. In addition to audiovisual fea-
tures provided in the dataset, they also analyzed linguistic, psycholinguistic, and
emotional features and generated classification models using Sequential Minimal
Optimization (SMO) for Support Vector Machine (SVM) for each feature set. In
the same year, Farnadi et al.,106 analyzed the use of multivariate regression tech-
niques to address the problem of personality perception using audio, video, and
text features.
In 2015, Chávez-Martı́nez et al.,107 analyzed the issue of mood and personality
impression over the YouTube vlog dataset using verbal and non-verbal features.
They also used the idea of compound facial expressions for incorporating high-level
facial features. The inference problem was solved using the multi-level classifier.
The results showed that combining mood and trait labels enhances performance as
compared to the single label approach.
In 2016 deep learning methods were used for automatic personality analysis par-
ticularly when dealing with multimodal data. The goal of the ChaLearn Looking
at people 2016 First Impression Challenge was to predict continuous scores from
the short video clips.5 This challenge was based on multi-modal data including
audio and video modalities, 7 of the 9 teams in the final including the top three
took full advantage of the audio modalities, including 4 teams including the best
three,80,108,109 (ECCV challenge), use deep learning methods for first impression
recognition. Later Gürpinar et al.,81 extended their previous work,74 by using au-
dio, scene, and facial features. The results revealed that the proposed framework
attained an accuracy of 0.913, in the ChaLearn challenge ICPR on First Impression
Recognition. In the same year, Çeliktutan and Gunes82 extended their work72 and
explored the fact that how judgements about personality vary over time and in dif-
ferent situational settings. For their work, they extracted audio-visual features and
then used Bidirectional-LSTM Network for mapping these features onto the contin-
uous annotations. Lastly, the outputs of the audio and visual regression models are
combined using decision-level fusion. The study revealed that training regression
models from visual cues and audio-visual annotations, as well as visual annotations
and audio-visual cues paired with audio cues using decision level fusion, yields the
best prediction results. In the same year, Joshi et al.,110 carried out experiments
over the SEMAINE corpus and analyzed diverse factors related to the estimation
of personality in a human-machine interaction environment evaluated by external
analysts and assessors. In their work, they analyzed the relationship between vari-
ous traits and the influence of context on trait perception. Finally, they estimated
the credibility of raters to locate a fix for the issues anticipated through subjec-
tive bias and then proposed a weighted model to deal with observer bias which

2340010-22
March 26, 2023 8:43 IJAIT S0218213023400109 page 23
1st Reading

Vision and Audio-based Methods for First Impression Recognition

brought major improvements for automatic estimation of perceivable cognitive and


physiological attributes of personality.
In 2019, for regressing apparent personality, Principi et al.,111 explored various
sources of bias influencing personality interpretation outcomes and introduced a
deep multi-modal network that blends raw audio and visual data with attribute-
specific model prediction. The results improved the state of the art and illustrated
that how complementary information can be used to enhance performance. In the
same year, Aslan et al.,112 proposed a deep multimodal system focussing on a
range of modalities, including audio, face, environment, and transcription features
for apparent personality analysis, and the results obtained the best overall mean
accuracy.
In 2020, Li et al.,113 proposed a deep Classification-Regression Network (CR-
NET) in a First impression framework for assessing Big-Five personality traits
as well as guiding in the hiring process. They performed their experiments on
ChaLearn First impression v2 dataset and proposed framework performs classi-
fication first using ResNet-34 network as a backbone for classification and then
regression using Extra tree regressor, so classification features are used as guide-
lines by the network for regression. To alleviate the regression-to-the-mean problem
Bell loss function is used. The results showed that the proposed technique achieves
high accuracy for the prediction of personality traits and job interview recom-
mendations. In the same year, Giritlioğlu et al.,114 used the Self-presentation and
Induced Behavior Archive for Personality Analysis (SIAP) and the first impression
dataset to perform a thorough evaluation of personality traits. They evaluated the
consistency of several data modalities and their fusion techniques in a methodical
way and looked at the qualities and exclusionary potential of induced conduct for
personality evaluation, finding that it contains personality features.
In 2021, Aslan et al.,115 proposed a unique multimodal framework for appar-
ent personality analysis. They used specific-modality sub-networks for this, which
include state-of-the-art deep networks as backbones and are supplemented with
additional LSTM layers to take advantage of temporal data. We employ two-stage
modeling to reduce the computational difficulties of multimodal optimization and
achieved good results.

3.3.1. Current issues


For taking full use of deep learning-based methodologies, which are influencing
several areas of research in computer vision, multimodal analysis of personality
requires huge datasets. Another concern is that datasets used in multimodal per-
sonality analysis, such as ELEA,116 are recorded in a very particular and regulated
context. Multimodal personality analysis can be effectively applied in real-world ap-
plications such as job recommendations but because of the sensitive nature of the
interviews or data protection laws,26 the databases utilized in Refs. 24 and 27 are

2340010-23
March 28, 2023 17:12 IJAIT S0218213023400109 page 24

S. Mushtaq et al.

not publicly accessible. According to a recent study,82 the works presented in the
literature employ different evaluation criteria so they are not comparable. External
observers typically form perceptions based on tiny slices taken from video samples.5
As a result, deciding which segment of the video from the entire sequence will be
analyzed is a typical issue that must be addressed. So it is still ambiguous if narrow
slice impressions for the prediction task would generalize across the entire video.
Queries like “Can the best slice from a video clip that efficiently recognizes first
impressions are automatically selected? Is it going to remain consistent throughout
the video?” might be a topic for future field research.

3.3.2. Possible solutions


Combining verbal and nonverbal cues103 could be a promising way for bringing
advancements in the field of APP. In job recommendation systems, multimodal
personality analysis is applied which could be made more clear and inclusive by
combining disparities among job categories with personality trait analysis. We no-
ticed that there is still a lot to learn about the recently proposed CNN-based
models for automatic personality perception.75,80,108 Despite presenting different
solutions, the three leading teams80,108,109 in the ChaLearn First Impression Chal-
lenge,5 achieved extremely similar overall results (0.912063, 0.912963, 0.910933),
demonstrating that proposed frameworks may take advantage of complimentary
traits111 that may be utilized in some way to increase overall accuracy. As noticed
from literature, deep learning approaches are viable solutions for tackling issues
arising in the multimodal analysis of first impression recognition.80,81,108 But there
is no universally accepted approach for recognizing first impressions. Thus by de-
signing standard evaluation methodologies, challenges, datasets, enhanced deep ar-
chitectures as well as the incorporation of multiple factors such as gender, age, and
ethnicity, can advance multimodal data research for first impression recognition
easier.

4. Comparative Analysis
We have reviewed the work on personality perception from the past ten years and
in this section, we have performed a comparison of that work to find open research
challenges and limitations (discussed in the next section). Here we have discussed
the datasets, features, techniques used, and performance achieved in the state-of-
the-art.

4.1. Available datasets


There are a limited number of datasets that are publicly available for personality
computing, which is the main problem in this field. In Tables 2 and 3 we have dis-
cussed some of the datasets which are publicly available for personality computing
and commonly used in the state-of-the-art.

2340010-24
Table 2. Description of publicly available datasets used in personality computing.
March 26, 2023

Dataset Year Data Type Short Description Task Pros Cons


8:43

FERET 1993–1996 Facial 14 051 static images, 1199 Face recognition. First facial standard Contains only static images
(Ref. 117) images. subjects, 365 duplicate database, allows algorithm so does not provide temporal
images, color = 8 bit grey, comparison, advances face dynamics, Focuses only on
resolution = 256 ∗ 384. recognition state-of-the-art. facial features.

AMI Corpus 2005–2006 Audio- 196 meetings, 213 subjects, Dialogue acts, group Attempt to include computer Despite its impressive size
(Ref. 118) visual, 100 hours of meetings, both activity recognition. vision features, contains rich this corpus is substantial
textual. scenario and non-scenario, annotations and cultural mix “specialist”, contextually
overview and close-up (metadata), contains both and compositionally specific,
cameras, audio contains induced and naturally mainly focus only on
close-taking and far-field occurring meetings. linguistic annotations.
microphones.

Mission 2007 Audio- 13 meetings, 52 subjects, Behavior classification. Improves weakness of Limited size, features are
Survival visual. length of recordings between Mission Survival Corpus that extracted from one specific
Corpus II 280 .1400 –340 .0900 , was due to short meeting discourse content which
IJAIT

(Ref. 119) avg length = 310 · 1000 , durations, lack of personality limits its usefulness in
total length = 6 h 470 800 . assessments and meeting general corpus linguistic
quality evaluations, also the research.

2340010-25
content is unscripted.

SSPNET 2010–2012 Audio. 96 news bulletins, 640 speech Personality trait Largest dataset for speech Only focuses on non-verbal
(Refs. 92, clips of 10 sec or less, prediction from speech. based personality perception, features, ratings are provided
120) 1 speaker per clip 322 unique provides a rich collection of by human judges and low
individuals, 307 professional real world data, along agreement between them can
& 309 non-professional personality score metadata is be the main source of error.
speakers. associated with each clip.

ELEA 2011 Audio- 40 meetings of 15 sec each, Emergent leadership Includes both self-reported Limited size, not all meetings
(Ref. 116) visual. including 3 or 4 members, recognition. and perceived personality provide both audio-visual
1st Reading

148 participants, 27 having traits, contains metadata so modalities, limited to the


both audio & video, offers both personality and specific context.
8 cameras (6 fixed & engagement labels.
2 movable).

(Continued)
S0218213023400109
page 25
March 26, 2023

Table 2. (Continued)
8:43

Dataset Year Data Type Short Description Task Pros Cons

YouTube 2011 Audio- 442 vlogs, length of each vlog Apparent personality Contains rich real-life Limited number of samples,
vlog dataset visual. is 1 min, 208 males & 234 trait analysis. interactions and metadata no interactions, metadata
(Refs. 103, females, each vlog covers only along with personality not properly described (only
121, 122) 1 participant. impressions, is not limited to gender information).
specific context.

SEMAINE 2012 Audio- 959 conversations, 150 Face-to-face interaction Rich in emotion and The number of total
(Ref. 123) visual. subjects, length of each clip of humans with virtual annotation, contains utterances is small, lack of
is 5 min, size = 780 × 580, agents of different metadata, recording quality multi-party conversation.
contains RGB and grey emotional styles. is high, quantity is
images and both frontal and significant, the clips are long
profile view. enough to discern patterns
over time.

ChaLearn 2016 Audio- 10 000 15-sec long videos were Apparent personality Largest, public and labeled Labeled with only Big-Five
First visual. selected from 3000 YouTube recognition. dataset for personality traits, magnifies bias, no
IJAIT

Impression channels, RGB images, analysis, not limited to interaction exists only a
(Ref. 5) 30 fbps, size = 1280 × 720 specific content, pairwise single individual conversing

2340010-26
comparisons to reduce the with a camera.
bias problems.

ChaLearn 2017 Audio- Extension of ChaLearn First Hirability impressions Along with Big-Five traits Imbalanced in ethnicity,
First visual, impression dataset. and Apparent transcriptions, interview existence of different
Impression hirability personality recognition. annotations, gender, perception biases, videos are
v2 (Ref. 124) impressions, ethnicity are present. not from specific recruitment
audio setting so behavior can be
transcript. slightly different from job
interview, raters have no
experience in recruitment.
1st Reading

MULTISIMO 2016–2018 Audio- 23 collaborative sessions, To model the behavior Is rich in modalities, Limited in size, most
Corpus visual, 49 participants, 46 are players of speakers using contains both self and utterances are dyadic,
(Ref. 125) textual. and 3 are facilitators, average multimodal data in perceived annotations along context-specific.
duration = 10 minutes, total multiparty social with metadata.
duration = 4 hrs. interactions.
S0218213023400109
page 26
March 28, 2023

Table 3. Description of different performance metrics.


17:12

Performance Metrics Short Description Pros Cons

Accuracy No of correct predictions were Simple to compute and comprehend, a single Inappropriate for unbalanced data and
made by the model out of total value sums up the model’s abilities, can be models providing probabilistic value, not
predictions (is classification used for comparing models. consider incorrect predictions.
metrics (CM)).

Confusion Verifies the classification model’s Shows model is confused while making Class probabilities are not provided, not
matrix on a test data set with predefined predictions, indicates model errors as well as possible to compare models and difficult to
actual values. (CM) error types, consider incorrect predictions, check under or overfit conditions as there
works with imbalanced data. is not a single measure provided.

Precision No. of positive category Measures models efficiency, describes the Focus only on positive class, suitable only
predictions that actually belongs model classification’s relevance, determines when focus is limiting false positives
IJAIT

to the positive category. (CM) model performance in an imbalanced dataset accurate and true identification of negative
for the minority class. class is not required, not used for model

2340010-27
comparison.

Recall No. of positive category Measures models efficiency, returns all Focus only on positive class, suitable only
predictions made out of positive relevant results classified accurately by when the focus is limiting false-negative,
examples in the dataset. (CM) model, assesses minority class coverage in an accurate and true identification of negative
imbalanced dataset. class is not required. Not used for model
comparison.

Unweighted It is the unweighted average of Determines the most reliable model’s quality, Complex metrics, gives equal weight to
Average Recall specific category recalls achieved used for comparing models, focus on both each class, not suitable when certain
(UAR) by the model (CM). positive and negative class, mostly deals with classes take precedence over others.
extreme imbalanced datasets.

(Continued)
S0218213023400109
page 27
March 26, 2023

Table 3. (Continued)
8:43

Performance Metrics Short Description Pros Cons


F-score/ The harmonic mean of the model’s Has the advantage of both Precision and Focus only on positive class, does not deal
F measure precision and recall (CM). Recall, used as a single metric for comparing with true negatives, gives equal weight to
models, used with class imbalance problems. Precision and Recall, measures are
asymmetrical.

Area under The capability of the classifier to Assesses model’s ability to differentiate Does not consider predicted probabilities.
curve (AUC) distinguish between categories and between classes regardless of the threshold Positive and negative classes are equally
summary of the ROC curve. ROC used, determines the validity of prediction weighted, does not perform well with
curve is the performance of ranking, can be used for model comparison. extreme dataset imbalance.
classification models at
classification thresholds (CM).

Mean Absolute Here variance between actual Determines the dataset’s average error Is not differentiable function,
IJAIT

Error (MAE) values and observed values is magnitude, easy to understand, has the same mathematical evaluation and numerical
averaged (is regression metrics unit as the output, most resilient to outliers. optimization is difficult.

2340010-28
(RM)).

Mean Squared Here the square of the variance Determines the error variance of a dataset, Not resilient to outliers, has squared unit
Error (MSE) between the actual value and differentiable function, easy to analyze as the output.
observed value is averaged (RM). mathematically and optimize numerically

Root mean It is accuracy metrics, calculated Determines standard deviation of errors in a Is prevalent but difficult to comprehend,
squared error by square rooting the mean of all dataset, quick to calculate, has same unit as sensitive to large errors, less resilient to
(RMSE) errors squared (RM). the output, differentiable function. outliers.
1st Reading

Coefficient of Explains how much variability can Determines the proportion of resultant Always starts increasing by introducing
determination be caused by the relationship variation represented by the regression additional features in the dataset, not give
(R2) between one factor and another model, provides baseline model for model error estimation, not determine whether a
related factor. (RM) comparison. model is good or bad.
S0218213023400109
page 28
March 26, 2023 8:43 IJAIT S0218213023400109 page 29
1st Reading

Vision and Audio-based Methods for First Impression Recognition

Table 4. State-of-the-art studies using still images.

Dataset Personality Features Method Performance


Sources Used Approach Used (Technique Used) Achieved

Ref. 56 Karolinska Dominance trait Facial Gentle boost Accuracies up to 90%


dataset, features classifier, SVM with are achieved.
Synthetic RBF kernel, KNN
dataset

Ref. 57 FERET Big-Five Facial SVM (polynomial, Accuracies up to 70%


Corpus features RBF kernel) are achieved
depending upon a
particular trait.

Ref. 58 Selfie image Big-Five Visual LIBSVM with RBF RMSE up to 0.7 (in
dataset features kernel APR), up to 0.5
achieved in (APP)
depending upon
particular trait and
features used.

Ref. 59 A new dataset 14 dimensions Facial Ranking SVM Accuracy = 67.5%


containing (Attractive, features (Rank SVMs) for governors, 65.5%
U.S. masculine, confident, for senators. The
politicians’ dominant, energetic, accuracy rating for
face images. well-groomed, rich, political party
perceived old, affiliations such as
intelligent, democrats versus
competent, honest, republicans is 62.6%
babyface, generous, for males and 60.10%
trustworthy) for females.

Ref. 60 Portrait Trustworthy Facial K-mean clustering, Precision = 64.96%,


dataset impression features SVM with RBF Recall = 91.28%,
kernel (for F1 = 74.94%.
classification)

Ref. 61 New face Objective traits Facial CNN (Baseline Prediction accuracy
attributes (gender, ethnicity, features (AlexNet)), of 60 to 98% is
dataset hair color, makeup Landmark achieved.
and age) and Augmented CNN
subjective traits (LACNN)
(emotion, attractive,
humorous, and
chubby)

Ref. 62 Twitter profile Big-Five Face, Kernel Partial Least RMSE up to 0.37
pictures holistic-level Square Regression (avg) achieved
dataset scene (KPLS) depending upon
descriptors used.

Ref. 63 AFLW Face Trustworthiness, Visual facial VGGNet19’s, R2 value up to 0.72


Dataset dominance, age, appearances VGGNet16, two achieved depending
and IQ shallow networks upon traits.
MOON

Ref. 64 Image dataset Big-Five Visual Statistical analysis Negative samples are
features, method low up to 0 and
facial positives samples are
expressions high up to 1184
depending upon the
particular trait,
range, and expression
used.

2340010-29
March 28, 2023 17:12 IJAIT S0218213023400109 page 30

S. Mushtaq et al.

4.2. Performance metrics


The main step of every research is to assess the proposed system’s efficiency and
for that purpose different metrics are used which are discussed in Table 4.

4.3. Comparative performance evaluation


In this section we evaluate performance of existing approaches as per data modal-
ities. Tables 5–7 summarize the dataset used, personality approach, features used,
techniques used, and performance achieved in the state-of-the-art experiments using
still-images, image-sequences, speech-modality, and multimodal-data respectively.

Table 5. State-of-the-art studies using image-sequences.

Dataset Personality Features Method Performance


Sources Used Approach Used (Technique Used) Achieved

Refs. 70, YouTubeVlog Big-Five Facial SVM Regressor In Ref. 70, R2 up


73 dataset expressions to 0.22 achieved,
of emotion and in Ref. 73, up
(sadness, to 0.19 is achieved
fear, joy, depending upon the
contempt, particular trait and
angry, feature set used.
surprise,
and neutral
expressions)

Ref. 71 YouTubeVlog Big-Five Visual Ridge Regression, Mean accuracy of


dataset, small non-verbal SVM classifier 70.4% achieved by
group features combining SVM
meetings with ridge
(ELEA) regression.

Ref. 72 SEMAINE Big-Five + three Visual Lasso, ridge Pearson’s


Corpus more traits (facial features regression correlation
attractiveness, (continuous coefficient
likeability, spatial prediction (COR/R) between
engagement). in space, actual and
continuous spatial predicted value up
and temporal to 0.85 and MSE
prediction) up to 0.03 achieved
depending upon
framework and
technique used, R2
up to 0.43.

Ref. 74 ChaLearn first Big-Five Face, scene Kernel Partial Mean Average
impression Least Square Accuracy
dataset Regression (MAA) = 0.9094.
(KPLS)

Ref. 75 ChaLearn first Big-Five Facial CNN (Modified, MAA using


impression features class activation image = 0.909,
dataset map are used) face = 0.912.

2340010-30
March 26, 2023 8:43 IJAIT S0218213023400109 page 31
1st Reading

Vision and Audio-based Methods for First Impression Recognition

Table 5. (Continued)

Dataset Personality Features Method Performance


Sources Used Approach Used (Technique Used) Achieved

Ref. 76 ChaLearn first Big-Five Texture Support vector For interview


impression based face regressors, MAA = 0.915746,
dataset features Gaussian process for Big-Five traits
regression accuracy up to
0.913775 is
achieved.

Ref. 77 New portrait Big-Five Facial CNN models Mean accuracy =


dataset, Selfie features (2 Pilot models, 68.6%.
dataset 2 CNN models for
CNN models for
classification,
autoencoder, and
2 FaceNet models
for transfer
learning and
feature encoding)

Ref. 78 Interview Communication Facial CNN Accuracy


dataset skills, Big-Five expressions up to 99%,
and MSE up to 0.031,
movements R2 up to 0.970,
R up to 0.978.

Ref. 79 ChaLearn first Big-Five Facial DFNN Accuracy avg up to


impression dynamics 0.9168, RMSE avg
dataset, VHQ up to 0.12, R avg
up to 0.45.

Table 6. State-of-the-art studies using speech-modality.

Dataset Personality Features Method Performance


Sources Used Approach Used (Technique Used) Achieved

Ref. 84 SSPNET Big-Five Prosody (energy, Logistic regression Accuracy using


Corpus segment lengths personal
(voiced & assessments =
unvoiced) and 75.5%, prosodic
statistics for each) features = 87.2%,
combination =
90.0%.

Ref. 85 SSPNET Big-Five Prosody Logistic regression, Accuracy =


Corpus SVM 60–72%.

Ref. 86 SSPNET Big-Five Intonation, voice Ordinal regression Accuracy up to


Corpus quality (linear probabilistic 80% is achieved.
approach)

Ref. 87 AMI Corpus Big-Five Speech activity, Boosting one-level Accuracy =


prosody, N-grams, decision trees 50–74.5%
dialog-act, (Boostexter) depending upon
interaction features particular trait.

2340010-31
March 26, 2023 8:43 IJAIT S0218213023400109 page 32
1st Reading

S. Mushtaq et al.

Table 6. (Continued)

Dataset Personality Features Method Performance


Sources Used Approach Used (Technique Used) Achieved

Ref. 88 Berlin Big-Five Acoustic prosodic ANN-based BFI Accuracy of


database of detector about 70–80% is
emotional achieved.
speech

Ref. 89 SSPNET Big-Five MFCC, FDLP Instance-based Accuracy


Corpus KNN, SVM between 90–100%
is achieved.

Ref. 91 SSPNET, Big-Five Baseline features SVM, UAR value and


Game-of- (contains a semi-supervised accuracy above
Nines, CNG collection of 6125 learning 60% are achieved
Corpus features from for most of the
Inter-speech 2012 traits.
speaker personality
sub-challenge, and
88 eGeMaps
features),
knowledge-based
features (41)

Ref. 94 SSPNET Big-Five Acoustic features Wavelet transform, Average accuracy


Corpus CNN, ANN up to 71.97% is
achieved
depending upon
wavelet setting.

Ref. 95 SSPNET Big-Five MFCC & energy, SVM, HMM Accuracy up to


Corpus pitch 90% is achieved.

Ref. 96 BIT Big-Five + 6 MFCC LSTM Mean accuracy =


multi-language sub-traits for 69.3 (for traits),
corpus each trait 61.9 for
sub-traits.

Ref. 97 MULTISIMO Big-Five Linguistic features AdaBoost, Naive For APR


Corpus (LIWC); audio Baye’s, Logistic accuracy between
features Regression, 82–94%, for APP
(eGeMAPS) Random Forest accuracy between
77–100%
depending on
different feature
sets and
classifiers.

Ref. 98 SSPNET Big-Five Audio (YAAFE) BIRCH, KNN, LR, Avg accuracy =
SVM 71.76.

Ref. 99 SSPNET Big-Five Acoustic features Asymmetric UAR up to


Corpus auto-encoder, Deep 81.2%, accuracy
Neural Network up to 76.9%
(DNN), SVM depending upon
the particular
trait.

2340010-32
March 28, 2023 17:12 IJAIT S0218213023400109 page 33

Vision and Audio-based Methods for First Impression Recognition

Table 7. State-of-the-art studies using multimodal-data.

Dataset Personality Features Method Performance


Sources Used Approach Used (Technique Used) Achieved

Ref. 102 Mission Big-Five Prosody, Naive Baye’s, HMM, Classification


Survival speaking SVM (linear, RBF accuracy up to
Corpus II activity, social kernel) 60–75% achieved
attention depending upon
particular trait
and classifier used.

Ref. 103 YouTube vlog Big-Five LIWC, N-gram SVM (linear, RBF R2 up to 0.31
dataset analysis, kernel), Random achieved
audiovisual forest depending upon
nonverbal cues, particular trait
facial expressions and features used.

Ref. 104 YouTube vlog Big-Five Audio-visual, Logistic regression Average


dataset bag of words, with a ridge estimator F1-score = 0.57.
demographic and
sentiment
features

Ref. 105 YouTube vlog Big-Five Audio-visual, SVM with different Average
dataset linguistic, kernels (generated F1-score = 0.76.
psycholinguistic, using
emotional SequentialMinimal
features Optimization)

Ref. 106 YouTube vlog Big-Five Audio, video, Various techniques of RMSE up to 0.64,
dataset textual features multivariate R2 up to 37%.
(emotion, regression (Single
linguistic) target, multi-target
stacking, multi-target
stacking controlling,
ensemble of regressor
chains, ensemble of
regressor chains
corrected,
multiple-objective
random forest)

Ref. 107 YouTube vlog Big-Five, mood Verbal, Multi-level classifier Macro-average up
dataset non-verbal to 65.9%, exact
audiovisual accuracy = 27.3%
features, depending upon
proposed facial feature set used.
features

Ref. 108 ChaLearn first Big-Five Audio-visual For visual modality (MAA) =
impression features VGG-face model (for 0.912968.
dataset pretraining), DAN,
DAN + ResNet). For
audio modality, a
linear regressor is
used

Ref. 80 ChaLearn first Big-Five Audio-visual 3D CNN, LSTM MAA = 0.912063.


impression features (TEST PHASE)
dataset

2340010-33
March 28, 2023 17:12 IJAIT S0218213023400109 page 34

S. Mushtaq et al.

Table 7. (Continued)

Dataset Personality Features Method Performance


Sources Used Approach Used (Technique Used) Achieved

Ref. 109 ChaLearn first Big-Five Audio-visual Deep Residual MAA = 0.910933.
impression features Networks
dataset (ResNet + FC)

Ref. 81 ChaLearn first Big-Five Audio, face, Deep CNN (for Face MAA = 0.913.
impression scene (VGGNet, FER
dataset 2013), for Scene
(VGG-19)), Kernel
ELM

Ref. 82 SEMAINE Big-Five + Audio,visual Bidirectional LSTM R2 avg up to 0.22,


Corpus attractiveness + cues MSE avg up to
likeability 0.38.

Ref. 110 SEMAINE Big-Five + 4 Audiovisual, Linear Support Average Error


Corpus new traits visual only Vector Regression Rate (RMSE) of
(likeability, weighted model =
facial 0.662.
attractiveness,
vocal
attractiveness,
engagement)

Ref. 111 ChaLearn first Big-Five Audio-visual + ResNet50 + FC Average mean


impression complementary (128), Wide ResNet, accuracy = 0.9167.
dataset, information VGGFace, VGGNet,
IMDB-WIKI, (emotion, age, AlexNet, modified
UTKFace, gender, ethnicity, ResNet18
SCUT- attractiveness)
FBP5500,
ensemble

Ref. 112 ChaLearn first Big-Five Face, ambient, CNN (ResNet and MAA = 0.9188.
Impressions audio, VGGish networks),
dataset V2 transcriptions LSTM
(CVPR’17)

Ref. 113 ChaLearn first Big-Five Visual (face, CR-NET (ResNet34), MAA = 0.9188.
impression scene), audio, BELL-LOSS, ETR
dataset V2 text regressor

Ref. 114 ChaLearn first Big-Five Audiovisual, ResNet, CNN-GRU, MAE avg up to
impression speech LSTNet 0.153 (for SIAP) &
dataset V2, transcripts 0.085 (for first
SIAP impression
dataset).

Ref. 115 ChaLearn first Big-Five Visual (face, VGGish, ResNet, Mean accuracy =
impression scene), audio, ELMO, LSTM 91.8%.
dataset V2 text

5. Open Research-challenges and Research-gaps


Although there are many improvements in the field of apparent personality analysis,
still the task remains challenging. This section addresses the challenges and concerns
which presumably may arise in this task. The challenges are as follows (i)–(vii):

2340010-34
March 26, 2023 8:43 IJAIT S0218213023400109 page 35
1st Reading

Vision and Audio-based Methods for First Impression Recognition

(i) Publicly available Datasets: There are a limited number of datasets which
are publicly available for the task of personality computing and those dataset
does not include a wide variety of challenging factors like ethnic background,
cultural factors, etc. It is noticed from the literature that most of the works
either constructed their own dataset58,59 or adjusted to their requirements
using datasets that were already developed for some other purposes.60,62 The
main obstacle in the case of APP is that various annotators need to rank
or label each and every subject which are present in the dataset to render a
logical and reliable evaluation in order to deal with the methodological issues
linked with the perception process. So building a personality corpus needs a
lot of manpower, time, and resources and this generally restricts the size of
the APP corpus. Another issue is that the labels associated with the available
dataset generally do not become public and recreating their outcomes can be
a big task.
Future direction: To accelerate the advancement in the field of per-
sonality computing primary need is to develop new, large and public datasets
including a lot of diverse characteristics. Datasets that: provide a combined ex-
amination of genuine and perceived personalities; includes psychological states
or contextual settings more realistically, continual forecasts can be beneficial
to the field of personality computing by providing innovations and new di-
rections. Also, if the datasets come from the same environment as the ones
to be utilized (for example, data samples for evaluating job interviews should
come from the recruitment context), then it will be possible to make better
predictions, which will improve the performance.
(ii) Subjective Bias: As in the case of APP multiple annotators need to anno-
tate all subjects present in the dataset and when the different raters’ provide
different ratings to the personality of the same individual it creates uncer-
tainty in the process of APP. Also, there are no correct insights about the
reliability of ratings so it is not apparent how to handle the issue when the
ratings are crowd-sourced. Hence, the low agreement between annotators is
the key problem in APP that directly influences the performance of the task.85
But recent investigations to some extend proposed a compelling method to
manage raters bias by performing pairwise comparison instead of assessing
each subject individually.5,59 Pairwise comparison brings several benefits, in-
cluding removing the need for annotators to set up absolute benchmarks or
scales for labelling subjects present in the dataset, recognizing the intensity of
each sample in the terms of the relational gap from other instances, resulting
in a more accurate ranking, and preventing formerly labelled samples from
biasing potential ratings.
Future direction: To address the issue of subjective bias, numerous char-
acteristics such as age, gender, and ethnicity,111 should be included during
personality analysis, and new methods for dealing with this issue should be

2340010-35
March 26, 2023 8:43 IJAIT S0218213023400109 page 36
1st Reading

S. Mushtaq et al.

devised. Although paired annotations minimize bias, it is extremely challeng-


ing to totally eliminate bias in subjective tasks. Bias can come from the pairs
directly so the emerging procedures must give special consideration to how
pairs are formed and provided to annotators, and the methodology utilized
to transform paired annotations to continuous values may amplify biases if
no specific consideration is given.126 Furthermore, judgements are influenced
by the onlooker, so the evaluation and comparison of qualities of observers
and persons being observed may help to clarify how certain biases arise. Also,
the machine learning community has paid almost little consideration to bias,
thus while building new datasets on subjective tasks and utilizing them for
real-world purposes, the machine learning community should consider bias.
This will aid in enhancing the performance of diverse tasks.
(iii) Feature Selection: Each trait’s output fluctuates from one feature set to
another and there is no appropriate solution for trait prediction.97,104–106 As
different data modalities focus on different features for example photos mainly
focus on appearance features, videos focus on temporal statistics and audio
modality has been demonstrated to convey information exceptionally associ-
ated with personality. So, automatic personality perception requires a careful
selection of feature sets or data modalities.
Future direction: So identifying the most important feature is one of the
most difficult things to do in order to improve the accuracy of results. The goal
of future studies should be to find the optimal feature selection technique for
identifying the most significant features from a set of conventional features.
And when there are so many features to apply but the accuracy of using all
of them is so low then a thorough analysis of multiple instances is required
to identify the key features and categories of features that can be used to
accurately predict personality. Also as it is observed in previous works that
performance usually improves when multimodal data modalities are used but
using different modalities also brings complexity in the task of perception like
in the case of video modality the issue of characterizing the slice duration and
location, and situational context affects the perception process so these issues
need to be addressed. However, in some works, accuracy does not improve
considerably when multimodal data is included. The reason for this could
be that most datasets for personality computing are acquired in specialized
and regulated situations, resulting in a limited set of attributes that do not
allow for considerable gains in accuracy. So when it comes to boosting the
performance of personality computing systems, these concerns should be taken
into account.
(iv) Methodological Issues: There is no benchmark method for the task of au-
tomatic personality perception so we cannot depend on a solitary model as
various models turn out best for various traits.5,92,97 Also, personality com-
puting approaches both APR and APP generally perform classification or
regression techniques (the most normal methodology is to consider evaluation

2340010-36
March 26, 2023 8:43 IJAIT S0218213023400109 page 37
1st Reading

Vision and Audio-based Methods for First Impression Recognition

above or below than median as two classes) for mapping features into person-
ality traits and when Big-Five personality model is used which is a dimensional
representation of personality but these methodologies convert it into categor-
ical one but the categories have not any goal in psychology as Psychology
mainly focuses on comparisons among peoples which we do every day and
which have realistic applications so better is to rank individuals as per their
personality traits.86 Additionally, the results of APR and APP approaches
may be acknowledged only when fulfilling proper certainty measures.
Machine and deep learning techniques: Machine or deep learning
approaches are used to portray characteristics of a subject into personality
traits. The majority of the works as introduced in literature mainly focus on
single task scenarios and have used handcrafted features and machine learn-
ing approaches for personality prediction,56,57,60,92 whereas few current works
focus on utilizing deep learning approaches5 to predict apparent personality
using multimodal data as these techniques bring several advantages like ex-
tract features automatically and these approaches have the ability to analyze
the entire scenario and situational context rather than just a small collection
of predefined features. And also assists superior Spatio-temporal modeling.
Future direction: Future research should focus on technology advance-
ments in order to strengthen the link between features and personality traits
using various methodologies and should aim for greater integration of compu-
tational and humanities disciplines, resulting in psychological and behavioral
methods to personality identification and externalization. Also, progress in
this topic could be aided by the introduction of unique and advanced deep
techniques for personality prediction.
(v) Personality Models: Most of the existing works concentrate only on the Big-
Five personality model with little emphasis on other models. Despite being
one of the most effective models for personality identification, the Big-Five
model has limitations such as being theoretical, overly descriptive, and lacking
references to personality development over time.
Future direction: Future studies should not be limited to the Big Five
approach. Other models must be included in the datasets. Also enhanced
feature sets should be used in personality-related models to provide a better
representation of personality factors across a variety of data modalities.
(vi) Cultural Dependent: Annotators from different cultural backgrounds al-
locate different traits to the personality of the same individual which means
APP is dependent on cultural background.86,96 But in personality computing,
such an impact has never been taken into consideration, that is automatic
systems in general either disregard the issue by not considering the culture
of both subjects and annotators or restrict the research to one culture only
to avoid multicultural effects. Furthermore, most of the existing datasets con-
centrate only on the Big-Five personality model with little emphasis on other
models.

2340010-37
March 26, 2023 8:43 IJAIT S0218213023400109 page 38
1st Reading

S. Mushtaq et al.

Future direction: Personality computing should not be limited to a single


culture; considering inter-cultural influences can make it a more intriguing
and demanding topic. Future research should look at the influence of different
cultures on personality computing, making it a more advanced field.
(vii) Utilization of Personal Assessments: The goal of personality theory is to
assess the personality of individuals using personality assessments but it has
been generally if not all ignored by the computing community.
Future direction: In psychology, personality assessments are used to
measure identify, and quantify personality variations that people exhibit over
time and in different situations. And personality is assessed via observations,
paper and pencil assessments, questionnaires, and projective approaches. How-
ever, computer models eliminate the need for personality questionnaires, al-
lowing assessments to be carried out using real-life behavioral data instead.
This widens the gap between the psychology and computer vision communi-
ties, and it is unclear which method of personality prediction is more accurate.
So, to close the gap with psychology, the computer vision community should
take into account both questionnaires and real-life behavioral data when pre-
dicting personality, allowing it to be used in more realistic applications and
advancing the discipline.

6. Research Methodology
The general research methodology steps for recognizing first impressions are dis-
cussed in this section. To address any problem, a comprehensive strategy must be
followed efficiently. Figure 6 depicts the general research methodology used in first
impression recognition. Every study begins after a thorough examination of existing
approaches and then the flaws of existing approaches are analyzed so that they can
be solved and taken into account when proposing a new approach. The following
steps can be used to explain the research methodology (i)–(vii):

(i) The first step is to choose a dataset that is appropriate for achieving research
goals.
(ii) After that, the pre-processing step will be used to remove any redundant or
undesired value from the dataset.
(iii) The third step is to choose data modalities.
(iv) Then from the selected data modalities, features will be extracted.
(v) Then, to map features to personality traits, regression or classification will be
performed using machine/deep learning approaches.
(vi) If a multimodal personality analysis is employed, the results of all modalities
will be combined.
(vii) The final step is to assign predicted personality values to the subjects present
in the dataset.

2340010-38
March 26, 2023 8:43 IJAIT S0218213023400109 page 39
1st Reading

Vision and Audio-based Methods for First Impression Recognition

Fig. 6. Methodology for first impression recognition.

7. Conclusion and Future Work


As already mentioned in the paper personality computing has enormous applica-
tions like Human-computer interaction, Human-robot interaction, interview pro-
cesses, etc., and it will be a hot and emerging research field in the next few years.
A lot of work has effectively been done on predicting personality from various data
modalities. But more work is needed in this field to increase the overall performance
of the existing personality prediction systems. This paper presents a thorough sur-
vey of existing research on automatic personality perception. We organized our
literature review into categories based on the data modalities they utilized. We
discussed the present approaches’ numerous drawbacks and suggested solutions to
address them. We also discussed open challenges and proposed research suggestions
for future research. We concluded that, to accelerate the advancements in the field of
personality computing primary need is to build new, large, and publicly available
datasets including a lot of diverse characteristics. And for data labelling, almost
all the emerging approaches for building personality predicting datasets depend
on manual annotation done by AMT workers via crowd-sourcing. To examine the

2340010-39
March 28, 2023 17:12 IJAIT S0218213023400109 page 40

S. Mushtaq et al.

raters’ credibility a proper and logical way of annotating personality traits should
be analyzed. Also, technological advancement is needed to identify meaningful con-
nections between features and personality traits appropriately, and to provide tight
reconciliation among human science and personality computing approaches.
Now few current works start utilizing deep learning approaches to predict per-
sonality using multimodal data, these methods have proven to be efficient and
commenced to create an authentic prediction. Deep learning brings several advan-
tages like works well with a large dataset, extract features automatically so does
not need any feature engineering. But still, more work is needed to be done using
deep architectures. In the future to increase the performance of personality pre-
diction systems, we will propose an approach using the multimodal fusion of the
best-suited features along with bias reduction.

References
1. D. C. Funder, Personality, Annual Review of Psychology 52 (2001) 197–221.
2. G. Matthews, I. J. Deary and M. C. Whiteman, Personality Traits, 3rd edn. (Cam-
bridge University Press, 2009).
3. Wikipedia, Personality computing — Wikipedia, the free encyclopedia http://en.
wikipedia.org/w/index.php?title=Personality%20computing&oldid=984709168,
(2022), [Online; accessed 25-September-2022].
4. A. Vinciarelli and G. Mohammadi, A survey of personality computing, IEEE Trans-
actions on Affective Computing 5(3) (2014) 273–291.
5. V. Ponce-López, B. Chen, M. Oliu, C. Corneanu, A. Clapés, I. Guyon, X. Baró,
H. J. Escalante and S. Escalera, ChaLearn LAP 2016: First round challenge on
first impressions — Dataset and results, in Computer Vision — ECCV 2016 Work-
shops, eds. G. Hua and H. Jégou (Springer International Publishing, Cham, 2016),
pp. 400–418.
6. Wikipedia, First impression (psychology) — Wikipedia, the free encyclopedia
http://en.wikipedia.org/w/index.php?title=First%20impression%20(psychology)&
oldid=1109623588, (2022), [Online; accessed 25 September 2022].
7. J. Willis and A. Todorov, First impressions: Making up your mind after a 100-ms
exposure to a face, Psychological Science 17(7) (2006) 592–598, PMID: 16866745.
8. I. J. Deary, The trait approach to personality, in The Cambridge Handbook of Per-
sonality Psychology, eds. P. J. Corr and G. Matthews, Cambridge Handbooks in
Psychology, Vol. 1. (Cambridge University Press, 2009), pp. 89–109.
9. L. R. Goldberg, The structure of phenotypic personality traits, American Psycho-
logist 48 (1993) 26–34.
10. A. K. Woodend, Continuity of cardiovascular care, Can. J. Cardiovasc. Nurs. 15(4)
(2005) 3–4.
11. A. E. Abele and B. Wojciszke, Agency and communion from the perspective of self
versus others, J. Pers. Soc. Psychol. 93 (2007) 751–763.
12. R. R. McCrae, The Five-factor model of personality traits: Consensus and contro-
versy, in The Cambridge Handbook of Personality Psychology, eds. P. J. Corr and
G. Matthews (Cambridge University Press, 2012) pp. 148–161.
13. G. J. Boyle and E. Helmes, Methods of personality assessment, in The Cambridge
Handbook of Personality Psychology, eds. P. J. Corr and G. Matthews (Cambridge
University Press, 2012) pp. 110–126.

2340010-40
March 26, 2023 8:43 IJAIT S0218213023400109 page 41
1st Reading

Vision and Audio-based Methods for First Impression Recognition

14. P. T. Costa, Jr. and R. R. McCrae, Domains and facets: Hierarchical personality
assessment using the revised NEO personality inventory, J. Pers. Assess. 64 (1995)
21–50.
15. A contemplated revision of the neo five-factor inventory, Personality and Individual
Differences 36(3) (2004) 587–596.
16. B. Rammstedt and O. P. John, Measuring personality in one minute or less: A
10-item short version of the big five inventory in English and German, Journal of
Research in Personality 41(1) (2007) 203–212.
17. S. D. Gosling, P. J. Rentfrow and W. B. Swann, A very brief measure of the big-five
personality domains, Journal of Research in Personality 37(6) (2003) 504–528.
18. S. Dhelim, N. Aung, M. A. Bouras, H. Ning and E. Cambria, A survey on personality-
aware recommendation systems, Artificial Intelligence Review 55 (2022) 2409–2454.
19. J. Li, M. X. Zhou, H. Yang and G. Mark, Confiding in and listening to virtual
agents: The effect of personality, in Proc. of the 22nd Int. Conf. on Intelligent User
Interfaces (IUI ’17 ) (Association for Computing Machinery, New York, NY, USA,
2017), pp. 275–286.
20. S. T. Völkel, R. Schödel, D. Buschek, C. Stachl, V. Winterhalter, M. Bühner and
H. Hussmann, Developing a personality model for speech-based conversational agents
using the psycholexical approach, in Proc. of the 2020 CHI Conf. on Human Factors
in Computing Systems (CHI ’20) (Association for Computing Machinery, New York,
NY, USA, 2020), pp. 1–14.
21. A. Cichocki, A. P. Kuleshov and J. Dauwels, Future trends for human-AI collabora-
tion: A comprehensive taxonomy of AI/AGI using multiple intelligences and learning
styles, Intell. Neuroscience 2021 (2021).
22. M. A. Salehinezhad, Personality and mental health, in Essential Notes in Psychiatry,
ed. V. Olisah (IntechOpen, Rijeka, 2012)
23. M. Gulhane and T. Sajana, Human behavior prediction and analysis using machine
learning — A review, Turkish Journal of Computer and Mathematics Education
12(5) (2021) 870–876.
24. I. Naim, M. I. Tanveer, D. Gildea and M. E. Hoque, Automated analysis and predic-
tion of job interview performance, IEEE Transactions on Affective Computing 9(2)
(2018) 191–204.
25. M. A. Coleman, Profile analyst: Advanced job candidate matching via automatic
skills linking (2016).
26. L. S. Nguyen and D. Gatica-Perez, Hirability in the wild: Analysis of online conver-
sational video resumes, IEEE Transactions on Multimedia 18(7) (2016) 1422–1437.
27. A. N. Finnerty, S. Muralidhar, L. S. Nguyen, F. Pianesi and D. Gatica-Perez, Stress-
ful first impressions in job interviews, in Proc. of the 18th ACM Int. Conf. on
Multimodal Interaction (ICMI ’16) (Association for Computing Machinery, New
York, NY, USA, 2016), pp. 325–332.
28. R. Mishra, R. Rodriguez and V. Portillo, An AI based talent acquisition and bench-
marking for job (2020).
29. H. Malik, H. Dhillon, R. Goecke and R. Subramanian, Characterizing hirability via
personality and behavior (2020).
30. Y.-S. Su, H.-Y. Suen and K.-E. Hung, Predicting behavioral competencies auto-
matically from facial expressions in real-time video-recorded interviews, Journal of
Real-Time Image Processing 18 (2021) 1011–1021.
31. J. A. Recio-Garcia, G. Jimenez-Diaz, A. A. Sanchez-Ruiz and B. Diaz-Agudo, Per-
sonality aware recommendations to groups, in Proc. of the Third ACM Conf. on
Recommender Systems (RecSys ’09) (Association for Computing Machinery, New
York, NY, USA, 2009), pp. 325–328.

2340010-41
March 28, 2023 22:45 IJAIT S0218213023400109 page 42

S. Mushtaq et al.

32. D. Müllensiefen, C. Hennig and H. Howells, Using clustering of rankings to ex-


plain brand preferences with personality and socio-demographic variables, Journal of
Applied Statistics 45(6) (2018) 1009–1029.
33. A. Farseev, Q. Yang, A. Filchenkov, K. Lepikhin, Y.-Y. Chu-Farseeva and D.-B.
Loo, SoMin.ai, in Proc. of the 14th ACM Int. Conf. on Web Search and Data Mining
(ACM, 2021).
34. M. Ali, P. Sapiezynski, A. Korolova, A. Mislove and A. Rieke, Ad delivery algorithms:
The hidden arbiters of political messaging, in Proc. of the 14th ACM Int. Conf. on
Web Search and Data Mining (WSDM ’21) (Association for Computing Machinery,
New York, NY, USA, 2021), pp. 13–21.
35. L. Edelson, S. Sakhuja, R. Dey and D. McCoy, An analysis of United States online
political advertising transparency (2019), arXiv preprint arXiv:1902.04385.
36. M. Ö. Demir, B. Simonetti, M. A. Başaran and S. Irmak, Voter classification
based on susceptibility to persuasive strategies: A machine learning approach, Social
Indicators Research 155 (2021) 355–370.
37. A. Sayeed, C. Verma, N. Kumar, N. Koul and Z. Illés, Approaches and challenges in
internet of robotic things, Future Internet 14(9) (2022).
38. N. Kumar and P. Jamwal, Analysis of modern communication protocols for IoT
applications, Karbala International Journal of Modern Science 7(4) (2021) 390–404.
39. A. Kumar, Y. Singh and N. Kumar, Secure unmanned aerial vehicle (UAV) commu-
nication using blockchain technology, in Recent Innovations in Computing, Lecture
Notes in Electrical Engineering, Vol. 832 (Springer Singapore, Singapore, 2022),
pp. 201–211.
40. M. Jaiswal, S. Tabibu and R. Bajpai, The truth and nothing but the truth: Multi-
modal analysis for deception detection, in 2016 IEEE 16th Int. Conf. on Data Mining
Workshops (ICDMW ) (IEEE, 2016), pp. 938–943.
41. G. Krishnamurthy, N. Majumder, S. Poria and E. Cambria, A deep learning approach
for multimodal deception detection (2018).
42. V. Lai, H. Liu and C. Tan, “Why is ‘Chicago’ deceptive?” Towards building model-
driven tutorials for humans, in Proc. of the 2020 CHI Conf. on Human Factors in
Computing Systems (CHI ’20), (Association for Computing Machinery, New York,
NY, USA, 2020), pp. 1–13.
43. L. Mathur and M. J. Matarić, Unsupervised audio-visual subspace alignment for
high-stakes deception detection, in 2021 IEEE Int. Conf. on Acoustics, Speech and
Signal Processing (ICASSP ) (IEEE, 2021), pp. 2255–2259.
44. M. Choi, C. Budak, D. M. Romero and D. Jurgens, More than meets the tie:
Examining the role of interpersonal relationships in social networks (2021).
45. J. Golbeck, C. Robles, M. Edmondson and K. Turner, Predicting personality from
twitter, in 2011 IEEE Third Int. Conf. on Privacy, Security, Risk and Trust and
2011 IEEE Third Int. Conf. on Social Computing (IEEE, 2011), pp. 149–156.
46. H. Christian, D. Suhartono, A. Chowanda and K. Z. Zamli, Text based personality
prediction from multiple social media data sources using pre-trained language model
and model averaging, Journal of Big Data 8 (2021) 68.
47. M. M. Tadesse, H. Lin, B. Xu and L. Yang, Personality predictions based on user
behavior on the facebook social media platform, IEEE Access 6 (2018) 61959–61969.
48. S. Mushtaq and N. Kumar, Text-based automatic personality recognition:
Recent developments, in Proc. of Third Int. Conf. on Computing, Communications,
and Cyber-Security (IC4S-2021 ), Lecture Notes in Networks and Systems, Vol. 421
(Springer Nature Singapore, 2022), pp. 537–549.

2340010-42
March 26, 2023 8:43 IJAIT S0218213023400109 page 43
1st Reading

Vision and Audio-based Methods for First Impression Recognition

49. C. Segalin, F. Celli, L. Polonio, M. Kosinski, D. Stillwell, N. Sebe, M. Cristani and


B. Lepri, What your facebook profile picture reveals about your personality, in Proc.
of the 25th ACM Int. Conf. on Multimedia (MM ’17 ) (Association for Computing
Machinery, New York, NY, USA, 2017), pp. 460–468.
50. D. White, C. A. M. Sutherland and A. L. Burton, Choosing face: The curse of self in
profile image selection, Cognitive Research: Principles and Implications 2 (2017) 23.
51. C. Srisawatsakul, G. Quirchmayr and B. Papasratorn, A pilot study on the effects of
personality traits on the usage of mobile applications: A case study on office workers
and tertiary students in the Bangkok area, in Recent Advances in Information and
Communication Technology, eds. S. Boonkrong, H. Unger and P. Meesad (Springer
International Publishing, Cham, 2014), pp. 145–155.
52. M. Kosinski, Y. Bachrach, P. Kohli, D. Stillwell and T. Graepel, Manifestations of
user personality in website choice and behavior on online social networks, Machine
Learning 95 (2014) 357–380.
53. J. S. J. Junior, Y. Gucluturk, M. Perez, U. Guclu, C. Andujar, X. Baro, H. Escalante,
I. Guyon, M. J. van Gerven, R. van Lier and S. Escalera, First impressions: A survey
on vision-based apparent personality trait analysis, IEEE Transactions on Affective
Computing 13 (2022) 75–95.
54. L. V. Phan and J. F. Rauthmann, Personality computing: New frontiers in person-
ality assessment, Social and Personality Psychology Compass 15(7) (2021) e12624.
55. Z. B. Mushtaq, S. M. Nasti, C. Verma, M. S. Raboca, N. Kumar and S. J. Nasti,
Super resolution for noisy images using convolutional neural networks, Mathematics
10(5) (2022).
56. R. Q. Mario, D. Masip and J. Vitrià, Predicting dominance judgements automati-
cally: A machine learning approach, in 2011 IEEE Int. Conf. on Automatic Face &
Gesture Recognition (FG) (IEEE, 2011), pp. 939–944.
57. N. Al Moubayed, Y. Vazquez-Alvarez, A. McKay and A. Vinciarelli, Face-based
automatic personality perception, in Proc. of the 22nd ACM Int. Conf. on Multi-
media (MM ’14 ) (Association for Computing Machinery, New York, NY, USA, 2014),
pp. 153–1156.
58. S. C. Guntuku, L. Qiu, S. Roy, W. Lin and V. Jakhetiya, Do others perceive you
as you want them to? Modeling personality based on selfies, in Proc. of the 1st
Int. Workshop on Affect & Sentiment in Multimedia (ASM ’15 ) (Association for
Computing Machinery, New York, NY, USA, 2015), pp. 21–26.
59. J. Joo, F. F. Steen and S.-C. Zhu, Automated facial trait judgment and election
outcome prediction: Social dimensions of face, in 2015 IEEE Int. Conf. on Computer
Vision (ICCV ) (IEEE, 2015), pp. 3712–3720.
60. Y. Yan, J. Nie, L. Huang, Z. Li, Q. Cao and Z. Wei, Exploring relationship be-
tween face and trustworthy impression using mid-level facial features, in MultiMedia
Modeling, eds. Q. Tian, N. Sebe, G.-J. Qi, B. Huet, R. Hong and X. Liu (Springer
International Publishing, Cham, 2016), pp. 540–549.
61. Y. Lewenberg, Y. Bachrach, S. Shankar and A. Criminisi, Predicting personal traits
from facial images using convolutional neural networks augmented with facial land-
mark information (2016).
62. A. Dhall and J. Hoey, First impressions — predicting user personality from twitter
profile images, in Human Behavior Understanding, eds. M. Chetouani, J. Cohn and
A. A. Salah (Springer International Publishing, Cham, 2016), pp. 148–158.
63. M. McCurrie, S. Anthony and W. J. Scheirer, Predicting first impressions with deep
learning, in 2017 12th IEEE Int. Conf. on Automatic Face & Gesture Recognition
(FG 2017 ) (IEEE, 2017), pp. 518–525.

2340010-43
March 26, 2023 8:43 IJAIT S0218213023400109 page 44
1st Reading

S. Mushtaq et al.

64. J. Zhang and Z. Wei, Statistical analysis of the relationship between visual features
and personality, in 2017 Second Int. Conf. on Mechanical, Control and Computer
Engineering (ICMCCE ) (2017), pp. 232–235.
65. R. Kosti, J. M. Alvarez, A. Recasens and A. Lapedriza, Emotion recognition in
context, in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)
(IEEE, 2017), pp. 1960–1968.
66. C. Y. Olivola and A. Todorov, Elected in 100 milliseconds: Appearance-based trait
inferences and voting, Journal of Nonverbal Behavior 34 (2010) 83–110.
67. M. Walker, F. Jiang, T. Vetter and S. Sczesny, Universals and cultural differences in
forming personality trait judgments from faces, Social Psychological and Personality
Science 2(6) (2011) 609–617.
68. F. Noroozi, C. A. Corneanu, D. Kamińska, T. Sapiński, S. Escalera and G.
Anbarjafari, Survey on emotional body gesture recognition, IEEE Transactions on
Affective Computing 12(2) (2021) 505–523.
69. A. Kachur, E. Osin, D. Davydov, K. Shutilov and A. Novokshonov, Assessing the
big five personality traits using real-life static facial images, Scientific Reports 10
(2020) 8487.
70. J.-I. Biel, L. Teijeiro-Mosquera and D. Gatica-Perez, FaceTube: Predicting person-
ality from facial expressions of emotion in online conversational video, in Proc. of
the 14th ACM Int. Conf. on Multimodal Interaction (ICMI ’12 ) (Association for
Computing Machinery, New York, NY, USA, 2012), pp. 53–56.
71. O. Aran and D. Gatica-Perez, Cross-domain personality prediction: From video blogs
to small group meetings (2013).
72. O. Çeliktutan and H. Gunes, Continuous prediction of perceived traits and social
dimensions in space and time, in 2014 IEEE Int. Conf. on Image Processing (ICIP )
(IEEE, 2014), pp. 4196–4200.
73. L. Teijeiro-Mosquera, J. Biel, J. Alba-Castro and D. Gatica-Perez, What your face
vlogs about: Expressions of emotion and big-five traits impressions in youtube, IEEE
Transactions on Affective Computing 6 (2015) 193–205.
74. F. Gürpınar, H. Kaya and A. Salah, Combining deep facial and ambient features for
first impression estimation, in Computer Vision — ECCV 2016 Workshops (2016),
pp. 1–14.
75. C. Ventura, D. Masip and A. Lapedriza, Interpreting cnn models for apparent
personality trait regression, in 2017 IEEE Conf. on Computer Vision and Pattern
Recognition Workshops (CVPRW ) (IEEE, 2017), pp. 1705–1713.
76. S. E. Bekhouche, F. Dornaika, A. Ouafi and A. Taleb-Ahmed, Personality traits and
job candidate screening via analyzing facial videos, in 2017 IEEE Conf. on Computer
Vision and Pattern Recognition Workshops (CVPRW ) (IEEE, 2017), pp. 1660–1663.
77. M. A. Moreno-Armendáriz, C. A. D. Martı́nez, H. Calvo and M. Moreno-Sotelo,
Estimation of personality traits from portrait pictures using the five-factor model,
IEEE Access 8 (2020) 201649–201665.
78. H.-Y. Suen, K.-E. Hung and C.-L. Lin, Intelligent video interview agent used to pre-
dict communication skill and perceived personality traits, Human-centric Computing
and Information Sciences 10 (2020) 3.
79. S. Song, S. Jaiswal, E. Sanchez, G. Tzimiropoulos, L. Shen and M. Valstar, Self-
supervised learning of person-specific facial dynamics for automatic personality
recognition, IEEE Transactions on Affective Computing (2021) 1–1.
80. A. Subramaniam, V. Patel, A. Mishra, P. Balasubramanian and A. Mittal, Bi-modal
first impressions recognition using temporally ordered deep audio and stochastic
visual features, in Computer Vision — ECCV 2016 Workshops, eds. G. Hua and
H. Jégou (Springer International Publishing, Cham, 2016), pp. 337–348.

2340010-44
March 26, 2023 8:43 IJAIT S0218213023400109 page 45
1st Reading

Vision and Audio-based Methods for First Impression Recognition

81. F. Gürpinar, H. Kaya and A. A. Salah, Multimodal fusion of audio, scene, and
face features for first impression estimation, in 2016 23rd Int. Conf. on Pattern
Recognition (ICPR) (2016), pp. 43–48.
82. O. Çeliktutan and H. Gunes, Automatic prediction of impressions in time and across
varying context: Personality, attractiveness and likeability, IEEE Transactions on
Affective Computing 8(1) (2017) 29–42.
83. H. Bilen, B. Fernando, E. Gavves and A. Vedaldi, Action recognition with dynamic
image networks, IEEE Transactions on Pattern Analysis and Machine Intelligence
40(12) (2018) 2799–2813.
84. G. Mohammadi and A. Vinciarelli, Humans as feature extractors: Combining
prosody and personality perception for improved speaking style recognition, in 2011
IEEE Int. Conf. on Systems, Man, and Cybernetics (IEEE, 2011), pp. 363–366.
85. G. Mohammadi and A. Vinciarelli, Automatic personality perception: Prediction of
trait attribution based on prosodic features extended abstract, in 2015 Int. Conf. on
Affective Computing and Intelligent Interaction (ACII ) (2015), pp. 484–490.
86. G. Mohammadi, A. Origlia, M. Filippone and A. Vinciarelli, From speech to per-
sonality: Mapping voice quality and intonation into personality differences, in Proc.
of the 20th ACM Int. Conf. on Multimedia (MM ’12) (Association for Computing
Machinery, New York, NY, USA, 2012), pp. 789–792.
87. F. Valente, S. Kim and P. Motlicek, Annotation and recognition of personality traits
in spoken conversations from the AMI meetings corpus, in Proc. of Interspeech 2,
(Interspeech, 2012).
88. C.-J. Liu, C.-H. Wu and Y.-H. Chiu, BFI-based speaker personality perception using
acoustic-prosodic features, in 2013 Asia-Pacific Signal and Information Processing
Association Annual Summit and Conf. (APSIPA, 2013), pp. 1–6.
89. S. Jothilakshmi, J. Sangeetha and R. Brindha, Speech based automatic personality
perception using spectral features, International Journal of Speech Technology 20
(2017) 43–50.
90. S. Jothilakshmi and R. Brindha, Speaker trait prediction for automatic personality
perception using frequency domain linear prediction features, in 2016 Int. Conf. on
Wireless Communications, Signal Processing and Networking (WiSPNET ) (2016),
pp. 2129–2132.
91. R. Solera-Ureña, H. Moniz, F. Batista, V. Cabarrão, A. Pompili, R. Astudillo, J.
Campos, A. Paiva and I. Trancoso, A semi-supervised learning approach for acoustic-
prosodic personality perception in under-resourced domains, in Proc. of INTER-
SPEECH 2017 (Stockholm, Sweden, 2017), pp. 929–933.
92. B. Schuller, S. Steidl, A. Batliner, E. Noeth, A. Vinciarelli, F. Burkhardt, R. van
Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi and B. Weiss, The inter-
speech 2012 speaker trait challenge, in 13th Annual Conf. of the International Speech
Communication Association 2012, INTERSPEECH 2012 1, (2012).
93. R. Solera-Ureña, H. Moniz, F. Batista, R. F. Astudillo, J. Campos, A. Paiva and
I. Trancoso, Acoustic-prosodic automatic personality trait assessment for adults and
children, in Advances in Speech and Language Technologies for Iberian Languages,
eds. A. Abad, A. Ortega, A. Teixeira, C. Garcı́a Mateo, C. D. Martı́nez Hinarejos,
F. Perdigão, F. Batista and N. Mamede (Springer International Publishing, Cham,
2016), pp. 192–201.
94. M.-H. Su, C.-H. Wu, K.-Y. Huang, Q.-B. Hong and H.-M. Wang, Personality trait
perception from speech signals using multiresolution analysis and convolutional neu-
ral networks, in 2017 Asia-Pacific Signal and Information Processing Association
Annual Summit and Conference (APSIPA ASC ) (2017), pp. 1532–1536.

2340010-45
March 26, 2023 8:43 IJAIT S0218213023400109 page 46
1st Reading

S. Mushtaq et al.

95. L. H. Gilpin, D. M. Olson and T. Alrashed, Perception of speaker personality traits


using speech signals, in Extended Abstracts of the 2018 CHI Conf. on Human Factors
in Computing Systems (CHI EA ’18 ) (Association for Computing Machinery, New
York, NY, USA, 2018), pp. 1–6.
96. M. Zhu, X. Xie, L. Zhang and J. Wang, Automatic personality perception from
speech in mandarin, in 2018 11th Int. Symp. on Chinese Spoken Language Processing
(ISCSLP ) (2018), pp. 309–313.
97. M. Koutsombogera, P. Sarthy and C. Vogel, Acoustic features in dialogue dominate
accurate personality trait classification, in 2020 IEEE Int. Conf. on Human-Machine
Systems (ICHMS ) (IEEE, 2020), pp. 1–3.
98. Z.-T. Liu, A. Rehman, M. Wu, W.-H. Cao and M. Hao, Speech personality recogni-
tion based on annotation classification using log-likelihood distance and extraction
of essential audio features, IEEE Transactions on Multimedia 23 (2021) 3414–3426.
99. E. J. Zaferani, M. Teshnehlab and M. Vali, Automatic personality traits perception
using asymmetric auto-encoder, IEEE Access 9 (2021) 68595–68608.
100. G. Mohammadi and A. Vinciarelli, Automatic attribution of personality traits based
onprosodic features, in Proc. of the 19th ACM Int. Conf. on Multimedia (MM’11 )
(Association for Computing Machinery, New York, United States, 2011), pp. 1–4.
101. J. Willis and A. Todorov, First impressions: Making up your mind after a 100-ms
exposure to a face, Psychological Science 17(7) (2006) 592–598, PMID: 16866745.
102. J. Staiano, B. Lepri, R. Subramanian, N. Sebe and F. Pianesi, Automatic modeling of
personality states in small group interactions, in Proc. of the 19th ACM Int. Conf.
on Multimedia (MM ’11) (Association for Computing Machinery, New York, NY,
USA, 2011), pp. 989–992.
103. J.-I. Biel, V. Tsiminaki, J. Dines and D. Gatica-Perez, Hi youtube! Personality im-
pressions and verbal content in social video, in Proc. of the 2013 ACM Int. Conf.
on Multimodal Interaction (ICMI 2013 ) (ACM, 12 2013), pp. 119–126.
104. C. Sarkar, S. Bhatia, A. Agarwal and J. Li, Feature analysis for computational
personality recognition using youtube personality data set, in Proc. of the 2014 ACM
Multi Media on Workshop on Computational Personality Recognition (WCPR ’14 )
(Association for Computing Machinery, New York, NY, USA, 2014), pp. 11–14.
105. F. Alam and G. Riccardi, Predicting personality traits using multimodal information,
in Proc. of the 2014 ACM Multi Media on Workshop on Computational Personality
Recognition (WCPR ’14 ) (Association for Computing Machinery, New York, NY,
USA, 2014), pp. 15–18.
106. G. Farnadi, S. Sushmita, G. Sitaraman, N. Ton, M. De Cock and S. Davalos, A
multivariate regression approach to personality impression recognition of vloggers,
in Proc. of the 2014 ACM Multi Media on Workshop on Computational Personality
Recognition (WCPR ’14) (Association for Computing Machinery, New York, NY,
USA, 2014), pp. 1–6.
107. G. Chávez-Martı́nez, S. Ruiz-Correa and D. Gatica-Perez, Happy and agreeable?
Multi-label classification of impressions in social video, in Proc. of the 14th Int.
Conf. on Mobile and Ubiquitous Multimedia (MUM ’15) (Association for Computing
Machinery, New York, NY, USA, 2015), pp. 109–120.
108. C.-L. Zhang, H. Zhang, X.-S. Wei and J. Wu, Deep bimodal regression for apparent
personality analysis, in Computer Vision — ECCV 2016 Workshops, eds. G. Hua
and H. Jégou (Springer International Publishing, Cham, 2016), pp. 311–324.
109. Y. Güçlütürk, U. Güçlı̈u, M. A. J. van Gerven and R. van Lier, Deep impres-
sion: Audiovisual deep residual networks for multimodal apparent personality trait
recognition, Lecture Notes in Computer Science, Vol. 9915 (Springer International
Publishing, 2016), pp. 349–358.

2340010-46
March 26, 2023 8:43 IJAIT S0218213023400109 page 47
1st Reading

Vision and Audio-based Methods for First Impression Recognition

110. J. Joshi, H. Gunes and R. Goecke, Automatic prediction of perceived traits using
visual cues under varied situational context, in 2014 22nd Int. Conf. on Pattern
Recognition (ICPR) (2014), pp. 2855–2860.
111. R. D. P. Principi, C. Palmero, J. C. S. J. Junior and S. Escalera, On the effect of
observed subject biases in apparent personality analysis from audio-visual signals
(2019).
112. S. Aslan and U. Güdükbay, Multimodal video-based apparent personality recognition
using long short-term memory and convolutional neural networks (2019).
113. Y. Li, J. Wan, Q. Miao, S. Escalera, H. Fang, H. Chen, X. Qi and G. Guo, CR-Net: A
deep classification-regression network for multimodal apparent personality analysis,
International Journal of Computer Vision 128 (Dec 2020) 2763–2780.
114. D. Giritlioğlu, B. Mandira, S. F. Yilmaz, C. U. Ertenli, B. F. Akgür, M. Kınıklıoğlu,
A. G. Kurt, E. Mutlu, Ş. C. Gürel and H. Dibeklioğlu, Multimodal analysis
of personality traits on videos of self-presentation and induced behavior, Journal
on Multimodal User Interfaces 15 (Dec 2021) 337–358.
115. S. Aslan, U. Güdükbay and H. Dibeklioğlu, Multimodal assessment of apparent
personality using feature attention and error consistency constraint, Image and Vi-
sion Computing 110 (2021) 104163.
116. D. Sanchez-Cortes, O. Aran, M. S. Mast and D. Gatica-Perez, Identifying emer-
gent leadership in small groups using nonverbal communicative cues, in Int. Conf.
on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal
Interaction (ICMI-MLMI ’10 ) (Association for Computing Machinery, New York,
NY, USA, 2010).
117. P. Phillips, H. Wechsler, J. Huang and P. J. Rauss, The FERET database and
evaluation procedure for face-recognition algorithms, Image and Vision Computing
16(5) (1998) 295–306.
118. J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec,
V. Karaiskos, W. Kraaij, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska,
I. McCowan, W. Post, D. Reidsma and P. Wellner, The ami meeting corpus: A pre-
announcement, in Machine Learning for Multimodal Interaction, eds. S. Renals and
S. Bengio (Springer Berlin Heidelberg, Berlin, Heidelberg, 2006), pp. 28–39.
119. N. Mana, B. Lepri, P. Chippendale, A. Cappelletti, F. Pianesi, P. Svaizer and
M. Zancanaro, Multimodal corpus of multi-party meetings for automatic social be-
havior analysis and personality traits detection, in Proc. of the 2007 Workshop on
Tagging, Mining and Retrieval of Human Related Activity Information (TMR ’07 )
(Association for Computing Machinery, New York, NY, USA, 2007), pp. 9–14.
120. G. Mohammadi, A. Vinciarelli and M. Mortillaro, The voice of personality: Mapping
nonverbal vocal behavior into trait attributions, in Proc. of the 2nd Int. Workshop on
Social Signal Processing (SSPW ’10 ) (Association for Computing Machinery, New
York, NY, USA, 2010), pp. 17–20.
121. J.-I. Biel and D. Gatica-Perez, The YouTube lens: Crowdsourced personality im-
pressions and audiovisual analysis of vlogs, IEEE Transactions on Multimedia 15(1)
(2013) 41–55.
122. J.-I. Biel and D. Gatica-Perez, Vlogcast yourself: Nonverbal behavior and attention
in social media, in Int. Conf. on Multimodal Interfaces and the Workshop on Machine
Learning for Multimodal Interaction (ICMI-MLMI ’10) (Association for Computing
Machinery, New York, NY, USA, 2010).
123. G. McKeown, M. Valstar, R. Cowie, M. Pantic and M. Schroder, The semaine
database: Annotated multimodal records of emotionally colored conversations be-
tween a person and a limited agent, IEEE Transactions on Affective Computing
3(1) (2012) 5–17.

2340010-47
March 26, 2023 8:43 IJAIT S0218213023400109 page 48
1st Reading

S. Mushtaq et al.

124. H. J. Escalante, I. Guyon, S. Escalera, J. Jacques, M. Madadi, X. Baró, S. Ayache,


E. Viegas, Y. Güçlütürk, U. Güçlü, M. A. J. van Gerven and R. van Lier, Design
of an explainable machine learning challenge for video interviews, in 2017 Int. Joint
Conf. on Neural Networks (IJCNN ) (2017), pp. 3688–3695.
125. M. Koutsombogera and C. Vogel, Modeling collaborative multimodal behavior in
group dialogues: The MULTISIMO corpus, in Proc. of the Eleventh Int. Conf. on
Language Resources and Evaluation (LREC 2018 ) (European Language Resources
Association (ELRA), Miyazaki, Japan, May 2018).
126. J. C. S. J. Junior, A. Lapedriza, C. Palmero, X. Baró and S. Escalera, Person per-
ception biases exposed: Revisiting the first impressions dataset (2020).

2340010-48

You might also like