Vision and Audio-Based Methods For First Impression Recognition Using Machine Learning Algorithms: A Review
Vision and Audio-Based Methods For First Impression Recognition Using Machine Learning Algorithms: A Review
§ Corresponding author
2340010-1
March 26, 2023 8:43 IJAIT S0218213023400109 page 2
1st Reading
S. Mushtaq et al.
1. Introduction
Personality is the delitescent concept that exists in all of us explains distinctive pat-
terns of personal thoughts, emotions, and behaviors, as well as cognitive processes
that lie underneath these patterns, whether covert or not.1 It acclaims dispositions,
mentalities, and sentiments and is primarily expressed in connection with others.
It encompasses both natural and obtained behavioral attributes that distinguish
individuals from each other, also individuals’ relations with their surroundings and
social circle can be seen in this way. The study of the psychology of personality
endeavours to clarify the propensities that underlie contrasts in conduct. So, the
fundamental objective of personality psychology is to recognize individuals inward
qualities from obvious practices as well as to research the relevant connections
among the two.2
Personality computing is a field that uses advanced computerized procedures
to analyze personality from a variety of sources such as context, audiovisual aid,
and social network.3 It may be thought of as an augmentation or supplement of
affective computing with the prior concentrating on personality traits and the last
on affective states. However, it is no longer restricted to personality and influence
but applies to any computing field related to the interpretation, envision, and an
assortment of people’s conduct. Despite, being exclusive and numerous in terms
of information, technology, and practices, all computing domain names involved
with personality keep in mind the three identical thrust areas4 (Fig. 1), specifically
the acknowledgement of individuals genuine personality (Automatic Personality
Recognition), the expectation of the person personality based on what others assign
to them (Automatic Personality Perception), and the development of synthetic
identities using artificial agents (Automatic Personality Synthesis).
2340010-2
March 26, 2023 8:43 IJAIT S0218213023400109 page 3
1st Reading
2340010-3
March 26, 2023 8:43 IJAIT S0218213023400109 page 4
1st Reading
S. Mushtaq et al.
2340010-4
March 26, 2023 8:43 IJAIT S0218213023400109 page 5
1st Reading
like the BFI-10 which is the simplified precise version of Big-Five inventory. Every
entry is involved in trait ranking by being related to a Likert Scale (ranging from
“Strongly disagree” to “Strongly agree”). In the case of APR first-person ques-
tionnaires are utilized which results in self-assessments so yield actual personality
of individuals whereas in the case of APP third-person questionnaires are utilized
which results in a personality that is based on other people’s impressions of a person
in a particular circumstance.
1.3. Applications
There are enormous applications of personality computing in the current day sce-
nario. As exploration advances in this field, better personality models with a lot
of precision and reliability will be discovered. Specifically, computing technologies
permit the preparation of massive quantities of behavioral records that may be
hard to examine with procedures generally applied in psychology. In this regard,
personality computing might assist with setting up connections among attributes
and behavior with adequacy up until this point. Some of the application areas can
be enumerated as (i)–(viii):
2340010-5
March 26, 2023 8:43 IJAIT S0218213023400109 page 6
1st Reading
S. Mushtaq et al.
match candidates to the roles and responsibilities that are good for them.
Personality evaluation shows the employer which open position the candi-
date is most appropriate for or if they should be hired.24–30
(v) Advertising Campaigns: Automatic personality computing can be uti-
lized as a rule by advertising campaigns to be more successful.31–33 It may be
beneficial in Social advancement where mental targeting could diminish the
expense of publicity drive. Likewise, it can also be used in political prognosti-
cation as a suggestion to policymakers for potential campaigns more practical
and focused.34–36
(vi) Automated Deception Detection: In present era of Internet-of-
Things,37–39 criminal activities are big threats for the society. Automatic
personality trait detection can assist in figuring out fake statements.40–43 It
also can assist in forensics in figuring out criminals. The authorities will be
able to limit the suspect pool if they are familiar with the personality char-
acteristics of those who were in the vicinity at the time of the incident.
(vii) Maintaining Personal Connections: Personal interactions are vital to
one’s physical and mental well-being. In today’s world, more people prefer
to interact digitally rather than in person. Social media has a significant role
to play in bringing users together and fostering relationships.44 On social
media platforms, people converse with strangers and create new acquain-
tances. Personality computing can help people sustain interpersonal relation-
ships on social media platforms. Numerous researches have been conducted
on detecting personality from social media contexts such as tweets,45,46
texts,46–48 and profile photos.49,50
(viii) Designing Specified Applications: Different individuals utilize distinctive
applications dependent on their requirements and inclinations.51,52 So person-
ality is connected with the utilization of computing technologies. Automatic
personality computing can emerge as an essential device for developing new
applications or introducing new functions to existing frameworks.
2340010-6
March 26, 2023 8:43 IJAIT S0218213023400109 page 7
1st Reading
perception and collect the most crucial techniques, as well as material for apparent
personality analysis. So that when the future researchers working in the field of
apparent personality analysis go through this paper they will be able to solve their
problem in a short amount of time without squandering time studying everything
from the fundamentals to advanced approaches.
2340010-7
March 26, 2023 8:43 IJAIT S0218213023400109 page 8
1st Reading
S. Mushtaq et al.
(iv) Commonly used datasets and performance evaluation criteria are discussed
in order to advance research in this area and to assist future researchers in
choosing data that matches their goals.
(v) Open research challenges, gaps, and opportunities for further research in this
subject are identified.
2340010-8
March 26, 2023 8:43 IJAIT S0218213023400109 page 9
1st Reading
2340010-9
March 26, 2023 8:43 IJAIT S0218213023400109 page 10
1st Reading
S. Mushtaq et al.
Inclusion Criteria: For inclusion, we first read the title of the article to see if
it is relevant, then the abstract to see if it is also relevant, then the keywords to
see if they are also relevant, and finally we read the entire article to see if it is
relevant, and if it is, we include it. We also include articles that presented novel
methodologies for first impression recognition/automatic personality perception or
focuses on both APP and APR.
Exclusion Criteria: Exclude all those articles that do not meet the above-
mentioned inclusion criteria.
3. Related Works
The study has divided the literature review into sections using topology for grouping
works according to the data modalities they utilized, such as static images, image
sequences, audio, and multi-modal data (Fig. 5).
2340010-10
March 26, 2023 8:43 IJAIT S0218213023400109 page 11
1st Reading
2340010-11
March 26, 2023 8:43 IJAIT S0218213023400109 page 12
1st Reading
S. Mushtaq et al.
a Hierarchical model. The social dimensions, which describe the relative ranking
orders among the samples, were predicted using Ranking SVMs (Rank SVMs). The
results showed that the model correctly predicts election outcomes with an accu-
racy of 67.5% for Governors and 65.5% for Senators and it also properly classifies
politicians’ political party affiliations, such as Democrats versus Republicans, with
62.6% accuracy for males and 60.1% accuracy for females.
In 2016, Yan et al.,60 performed experiments to verify the connections between
facial appearance and personality impression namely trustworthiness. They con-
ducted their experiments over a subset of the LFW dataset containing 2010 por-
traits annotated by 250 volunteers. They used face local descriptors to extract
low-level characteristics from various face areas. After that, they used the k-means
algorithm to cluster local descriptors into a set of mid-level features to mitigate
the semantic-based gap between high-level and low-level features. Then, they uti-
lized SVM to figure out the connections between facial features and personality
impressions. The results showed that the proposed framework outperforms state-
of-the-art efforts when compared to them.In the same year, Lewenberg et al.,61 pre-
sented an advanced technique for estimating both objective and subjective traits
using a Landmark augmented Convolution Neural network (LACNN). They per-
formed their experiments over a new Face Attributes Dataset (a subset of PubFig
Dataset. The results showed that LACNN brings consistent improvement indicates
that the LACNN improved classification performance for both objective as well as
subjective traits and generates responses with more detailed information as com-
pared to non-augmented baseline. In the same year, Dhall and Hoey,62 suggested a
multivariate method for inferring users’ personality impressions from Twitter pro-
file photos. The relationship between Big-Five personality factors derived through
user Tweet analysis and the suggested photo-based profile framework is evaluated.
In their work, they considered Face as well as background. Both Hand-Crafted
(Face-HOG, Face LPQ, Scene-CENTRIST) and deep learning (Face-VGG, Scene-
ImageNet) features were computed from face region as well as background on users
profile photos. Kernel partial least square method was used for inferring Big-Five
traits. The results showed that scene descriptors had a high correlation with the
Big-Five traits features derived from user analysis of tweets (openness). Also, when
compared to Hand-crafted features, deep learning features mixed with KPLS re-
gression perform the best.
In 2017, McCurrie et al.,63 performed experiments over a unique ground truth-
free dataset that included 6300 grayscale facial pictures from the AFLW dataset
that were labeled with four attributes (Dominance, trustworthiness, age, and IQ)
to develop efficient automatic predictors for social attribute assignment. They used
an online psychophysics testing platform (TestMyBrain.org), which allowed them
to collect data from a larger number of raters from a potentially more reliable and
geographically variable source. To train predictive models CNN regression frame-
work is used and five deep architectures (modified VGGNet19, modified VGGNet16,
MOON, and two shallower customized architectures with few convolutional layers
2340010-12
March 26, 2023 8:43 IJAIT S0218213023400109 page 13
1st Reading
and different regularization levels) were compared. Out of all MOON (mixed op-
timizations network), architecture slightly performs better so used as a base for
effective final models. The results demonstrated that the models have a significant
correlation with human crowd ratings and indicate deep architectures contribute
little to no improvement. In the same year, Zhang et al.,64 analyzed the correlation
between personality and visual characteristics (color, expressions) using statisti-
cal approaches. They performed their experiments on 2000 images collected from
Google, Bingo, Baidu marked by 54 people in terms of Big-Five traits. For color
features, the HSV color space model is used so they calculate average saturation,
average brightness, and hue in HSV space. They used Microsoft Emotion API for
the automatic extraction of three types of expressions (positive, negative, and 10
neutral) from a person’s image. The results showed that different features have
varying degrees of correlation with personality and certain features like happiness,
neutrality, and saturation influence personality directly.
Current Issues: The majority of works that use still photos as input data are
primarily concerned with facial characteristics.56–58 Analyzing the relationship be-
tween facial features and personality traits is fraught with difficulties like there are
numerous distinct facial characteristics, many of which are difficult to identify. Dis-
crete facial traits have weak impacts that become statistically significant only when
large samples are used. The works using realistic images for personality prediction
may face more problems because the basic features of an image are dependent
upon the subject’s choice which in turn is personality-dependent because one of
the primary reasons why people take and share images is to communicate their
personality to others. Isolating each variable’s influence from the plethora of other
factors appears to be a challenging task. Context is usually ignored in the case of
still images. As stated in Ref. 65, context plays a crucial role in emotion perception
using images, which is perfectly consistent with the research of APP. Although some
studies suggest that background information such as haircuts or apparel should be
ignored,66,67 people can use such information to actively influence how others per-
ceive them. Also, the unpredictability of human raters’ evaluations of personality
traits is an additional hurdle for studies attempting to uncover the face-personality
relationships. So to generate credible estimations of personality qualities for each
image, a large number of human raters’ are required.
Possible Solutions: When the attention is on the face as well as its surrounding
regions, the issues associated with using still images as input data for personality
prediction can be overcome to some extent. The works in Ref. 62, concentrated on
image regions outside the facial area, but the topic needs to be explored further.
Analyzing body language68 is a hot topic in computer vision, may enhance apparent
personality characteristic analysis. So, including a variety of non-verbal signs like
hand position, smile type, gaze direction, movement of the head, body motion, mor-
phological characteristics of the face can improve the efficiency of works using still
images because these are crucial indicators of a person’s emotional and cognitive
2340010-13
March 28, 2023 21:44 IJAIT S0218213023400109 page 14
S. Mushtaq et al.
inner state. Furthermore, research demonstrates that data labeling annotators are
influenced by context. Thus including context can advance research using still im-
ages. Also, current studies have attempted to take a more holistic approach to the
subject, looking into people’s subjective perceptions of their personalities based on
integral facial images.69
2340010-14
March 26, 2023 8:43 IJAIT S0218213023400109 page 15
1st Reading
In 2016, Gürpinar and team,74 utilized perceptual cues (facial expressions and
scene) for apparent personality analysis on ChaLearn First impression dataset.
Then Kernel Extreme Learning Machine regressor is utilized which accepts a com-
bination of scene and facial expressions as input. The proposed framework achieved
an accuracy of 90.94% in ChaLearn First Impression ECCV Challenge.
In 2017, Ventura et al.,75 carried out experiments over ChaLearn Lap 2016
dataset to investigate why CNN models are so adept at recognizing first impres-
sions automatically. The results revealed that the majority of the discriminative
information required to infer personality traits is found in the face and CNN’s
internal representation specifically analyzes important face regions such as mouth,
nose, and eyes. Finally, they analyze the role of Action Units (AUs) in inferring per-
sonality traits and demonstrate how certain AUs influence facial trait judgments.
In the same year, Bekhouche et al.,76 introduced a unique strategy for Big-Five
trait estimation as well as for assessing features of job candidates from videos
of their faces. They carried out their experiments over the ChaLearn LAP 2016
database. In their work, they extracted texture features from face regions using
LBQ and BSIF descriptors and their fusion strategies and after that, the Pyramid
Multi-Level (PML) texture features were used to represent the results of these two
descriptors. Then to estimate the score of Big-Five traits, these PML features were
fed to five support vector regressors (SVR), one SVR per personality trait. The
output of these SVRs is then fed into Gaussian Process Regression (GPR) and ex-
periments achieved good results as claimed by authors, despite the fact that deep
learning-based algorithms can yield excellent performance, temporal facial texture
methodologies are still useful.
In 2020, Armendáriz et al.,77 performed experiments on New Portrait Personal-
ity Dataset (modified ChaLearn first impression dataset) to predict the personality
of individuals from their portraits. Five different variations of the dataset were uti-
lized. Several CNN models were utilized to extract features from portraits that are
personality trait indicators. To access the dataset’s performance, two Pilot models
were utilized. Both regression and classification tasks were used performed using
two regressor models and two classification models. To boost the performance of
the proposed models, transfer learning and feature encoding were investigated. Au-
toencoder was used to enhance feature extraction it used images from the Selfie
dataset (which contains 46 836 selfie photos tagged with 36 attributes grouped into
numerous categories such as hair, shape, and gender) and allows the network to
learn various patterns observed in selfies to display them in a consolidated manner.
To transfer face detection expertise to the proposed model, two FaceNet models
were utilized. The outcomes revealed that the proposed framework requires a sin-
gle portrait for assessing personality traits, feature extraction is automatic, and
both tasks that are extraction of features as well as personality trait classification
are performed by a single model and compared to human judgment greater per-
formance and accuracy is achieved in predicting 4 of 5 traits of Big-Five model.
2340010-15
March 28, 2023 17:12 IJAIT S0218213023400109 page 16
S. Mushtaq et al.
In the same year, Suen et al.,78 presented AVI-AI, an asynchronous video inter-
view (AVI) framework using artificial intelligence for decision making which is a
TensorFlow-based CNN model, to anticipate job applicant’s communicative abili-
ties and personality attributes automatically. For this purpose, they recruited 114
individuals (57 evaluators, 57 respondents) and their findings revealed that the
proposed model accurately predicts an applicant’s communicative abilities as well
as their Big-Five personality traits (except extraversion and conscientiousness), as
evaluated by the study’s experienced human evaluators.
In 2021, Song et al.,79 conducted automatic personality profiling tests on real
(VHQ) and perceived personality (first impression) datasets. They used a semi-
supervised learning approach and introduced a new rank loss and domain adapta-
tion approach for this, and then placed a convolution layer in every skip layer of
the trained Domain facial neural network (DFNN) that is trained for each indi-
vidual uniquely. As a result, the learned weights adjust to the associated person’s
facial behavior, and they suggested that these weights be used as a person-specific
descriptor, with encouraging outcomes. They also demonstrated that the actions
performed by the person in the video are important, and that the blending of mul-
tiple tasks achieves the maximum level of accuracy and the dynamics of multiple
scales are more illuminating than those of a single one.
Current Issues: With the advent of video modality, it is now possible to add tem-
poral information to the extensively utilized spatial parameters. When compared
to still photos, spatial-temporal information took first impression analysis to a new
depth, opening up a wider number of possibilities. But predicting personality from
Spatio-temporal turned out to be a study area that has not been fully explored.
This could be because of the difficult and time-consuming task of creating accu-
rate labels for a large amount of data over time especially when considering deep
learning-based methodologies, which have not yet been used in this area to our
knowledge. Continuous predictions in first impressions were explored by Çeliktutan
and Gunes.72 But their continuous prediction could be viewed as a frame-based
regressor, with each frame considered separately. As a result, the scene’s dynamics
are not fully explored. Works in Refs. 74 and 76 use statistics generated for the
frame sequence to represent short video clips globally. Despite the fact that such
a technique does not analyze the frames individually, it nevertheless ignores the
data’s temporal progression. The work in Ref. 78, is a good step towards develop-
ing automatic interview agents for the recruitment process, but the results are not
as good as they could be because of the small dataset and the limited number of
features used.
The work in Ref. 49, describes facial expression output statistics as a dynamic
signal observed over short periods of time. It might be seen as an advanced step
in terms of analyzing first impressions on a temporal scale dynamically. Using rel-
atively simple motion pattern representations71,78 temporal information has also
been exploited to improve results. But motion-based techniques are still in their
2340010-16
March 26, 2023 8:43 IJAIT S0218213023400109 page 17
1st Reading
infancy. Only a few works, included in Sec. 3.3 suggested simulating Spatio-temporal
information by employing more complex and dynamic methodologies like LSTM,80
or 3D-CNN.80 As a result, there are still unanswered questions about the topic’s
influence (in data labeling or prediction), advantages, and effective temporal infor-
mation modeling techniques. Also in most works personality is predicted from short
video clips, which is a challenging task. The literature revealed that characteristics
assessed at the start of each video clip are better predictors of viewers’ impressions,
validating the hypothesis that first impressions are formed through brief encoun-
ters.7 However, more research is required to corroborate the authors’ hypothesis,
to check if the same effect occurs when diverse nonverbal sources are utilized, and
whether each data type’s ideal slice duration and position are the same.
Possible Solutions: According to this work, the number of techniques generated
for image sequences is significantly lesser when compared to the multimodal cat-
egory (audiovisual (Sec. 3.3)). It could be due to the advantages of adding more
data or the numerous disciplines being examined. Only a few of the works discussed
in the above Section like Refs. 72 and 74 had expanded their works by including
both audio and visual data81,82 which focuses on the advantages of incorporating
acoustic characteristics into the framework. Despite this, temporal data is not fully
exploited by the majority of works. Future research should include a feature repre-
sentation that could keep track of the data’s temporal evolution, such as Dynamic
Image Networks,83 utilized in action recognition. As a result, additional work in
this subject is needed to overcome the issues of image sequences by fully using
spatio-temporal data. The framework’s efficiency can be improved by including
audiovisual data as well as advanced frameworks.
2340010-17
March 26, 2023 8:43 IJAIT S0218213023400109 page 18
1st Reading
S. Mushtaq et al.
2340010-18
March 26, 2023 8:43 IJAIT S0218213023400109 page 19
1st Reading
with Big-Five traits, which was used to train binary classifiers (SVM) for each
Big-Five trait and another large unlabelled dataset CNG corpus containing Euro-
pean Portuguese children speech which was used to iteratively refine initial models
in semi-supervised learning. And as a test set Game-of-Nines dataset was used to
concentrate on how to determine the Big-Five personality traits of Portuguese chil-
dren using personality models derived from a speech corpus of French adults and
evaluate the overall efficiency of their proposed semi-supervised learning strategy.
They used various feature sets including baseline features from the sub-challenge,92
eGeMAPS features, and knowledge-based features. Despite the enormous bungle
caused by language and age differences, results showed that it is feasible to achieve
enhancements utilizing a semi-supervised approach and knowledge-based features
throughout age and language. It is the extension of the work performed in Ref. 93,
where experiments were performed over SSPNET (training data) and Game-of-
Nines (testing data) corpus to analyze the utilization of a heterogeneous speech
corpus for automatic personality prediction using the Big-Five model and the re-
sults revealed that there exists a consistent collection of acoustic-prosodic features
for Extraversion and Agreeableness features in both adult and child speech, provid-
ing information on how to detect personality traits in spontaneous speech across
age and language. In the same year, Su et al.,94 performed experiments over SSP-
NET speakers to automatically predict personality from speech signals utilizing
wavelet-based multi-resolution technique and convolution neural networks. In their
work, they firstly utilized wavelet transformation to broke down speech signals into
signals at a different level of resolution. Then at each resolution, acoustic features
were extracted using the OpenSMILE toolkit, after that these features are fed to
CNN which generates BFI-10 profiles. Finally, the perception trait is determined by
feeding these BFI-10 profiles to five artificial neural networks (ANN), each for one
of the Big-Five traits. The results showed that an average accuracy of 71.97% was
acquired utilizing this method outperforming the baseline method in the INTER-
SPEECH 2012 speaker trait sub-challenge and traditional ANN-based method.
In 2018, Gilpin et al.,95 performed experiments to predict whether a speaker’s
Big-Five traits are in low or high ranges utilizing a moderately limited amount
of data and features utilizing SVM and HMM classifiers. Firstly, they construct
SVM and HMM classifiers for automatic personality prediction using the SSPNET
speech corpus. Then they determine the correlation between features and speaker
subgroups and finally, they assessed the performance of SVM on the User Study
Dataset containing 15 speech clips record by three unique speakers annotated by
12 assessors. The results confirmed that 3 out of 5 traits are assessed with high
accuracies using an SVM classifier regardless of whether trained on certain features
and the dataset is quite small. In the same year, Zhu et al.,96 presented a novel
skip-frame LSTM method for predicting speakers’ personalities from the speech in
Mandarin. They performed experiments on the Mandarin subset of the BIT multi-
language corpus. In their work, they categorized each of the Big-Five traits into
2340010-19
March 26, 2023 8:43 IJAIT S0218213023400109 page 20
1st Reading
S. Mushtaq et al.
six sub-traits, resulting in a total of 30 traits., and as a benchmark, they used the
conventional SVM method with standard prosodic feature extraction. The results
revealed that the Extraversion trait is the easiest to predict like in French corpus
yet high accuracy of Agreeableness is as opposed to French corpus which may be
because of different cultural backgrounds, Also relationship among traits and sub-
traits is investigated and results showed that Big-Five traits are easier to predict
than their sub-traits because they are depictions in higher resolution. Finally, skip-
frame LSTM outperforms the traditional SVM framework corpus even on a small
corpus because it can use low-level features for direct inference of personality traits.
In 2020, Koutsombogera et al.,97 performed experiments over MULTISIMO cor-
pus for automatic recognition and perception of personality of individuals from their
dialogue. They analyzed participants’ speech and transcripts for extracting audio
and linguistic characteristics and used both self and informant-assessed reports.
They utilized different models like AdaBoost, Naive Bayes, Logistic Regression, and
Random Forest for this task and analyzed their performance using both reports.
The results showed that context is less important than acoustics because, models
based solely on acoustic features, or models that combine acoustic and linguistic
features, outperform models based solely on linguistic features, and there is no
preference for a particular model or the feature set for both personality recognition
and perception as various models outperform each other for various traits. In the
same year Liu et al.,98 presented an approach that integrates a log-likelihood based
Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) method for
grouping annotations with a technique for extracting crucial audio cues(YAAFE)
to improve accuracy while reducing the complexity of conventional speaker predic-
tion algorithms. They evaluated their approach on the SSPNET dataset, and the
findings revealed that it outperformed baseline methods.
Recently in this year (2021), Zaferani and his team,99 presented an advanced
approach for automatic extraction of features to characterize the well-known Big-
Five dimensions of personality. They conducted their research on the SSPNET cor-
pus and used data augmentation techniques for increasing the size of the dataset
then, to boost classification results, an asymmetric auto-encoder was used to ex-
tract important features automatically. The results revealed that the proposed sys-
tem brings significant classification improvements when compared with traditional
stacked auto-encoder, CNN, and other published works.
2340010-20
March 26, 2023 8:43 IJAIT S0218213023400109 page 21
1st Reading
audio modality are small, making it difficult to fully exploit techniques. The major-
ity of works are based on machine learning approaches84–86 which is still a manual
process. The extraction of features is done manually, which takes time and is a
hectic job. In Ref. 96, regardless of the small dataset, deep learning algorithms are
applied, however, the results are not consistently improved, and training the LSTM
network is tough.
2340010-21
March 26, 2023 8:43 IJAIT S0218213023400109 page 22
1st Reading
S. Mushtaq et al.
audiovisual, bag of words, demographic, and sentiment features and performed clas-
sification by using a logistic regression model. Results showed that the significance
of feature sets varies for different traits. In the same year, Alam and Riccardi,105
came with the same results when addressed personality impressions using the same
dataset and was the winner of the WCPR challenge. In addition to audiovisual fea-
tures provided in the dataset, they also analyzed linguistic, psycholinguistic, and
emotional features and generated classification models using Sequential Minimal
Optimization (SMO) for Support Vector Machine (SVM) for each feature set. In
the same year, Farnadi et al.,106 analyzed the use of multivariate regression tech-
niques to address the problem of personality perception using audio, video, and
text features.
In 2015, Chávez-Martı́nez et al.,107 analyzed the issue of mood and personality
impression over the YouTube vlog dataset using verbal and non-verbal features.
They also used the idea of compound facial expressions for incorporating high-level
facial features. The inference problem was solved using the multi-level classifier.
The results showed that combining mood and trait labels enhances performance as
compared to the single label approach.
In 2016 deep learning methods were used for automatic personality analysis par-
ticularly when dealing with multimodal data. The goal of the ChaLearn Looking
at people 2016 First Impression Challenge was to predict continuous scores from
the short video clips.5 This challenge was based on multi-modal data including
audio and video modalities, 7 of the 9 teams in the final including the top three
took full advantage of the audio modalities, including 4 teams including the best
three,80,108,109 (ECCV challenge), use deep learning methods for first impression
recognition. Later Gürpinar et al.,81 extended their previous work,74 by using au-
dio, scene, and facial features. The results revealed that the proposed framework
attained an accuracy of 0.913, in the ChaLearn challenge ICPR on First Impression
Recognition. In the same year, Çeliktutan and Gunes82 extended their work72 and
explored the fact that how judgements about personality vary over time and in dif-
ferent situational settings. For their work, they extracted audio-visual features and
then used Bidirectional-LSTM Network for mapping these features onto the contin-
uous annotations. Lastly, the outputs of the audio and visual regression models are
combined using decision-level fusion. The study revealed that training regression
models from visual cues and audio-visual annotations, as well as visual annotations
and audio-visual cues paired with audio cues using decision level fusion, yields the
best prediction results. In the same year, Joshi et al.,110 carried out experiments
over the SEMAINE corpus and analyzed diverse factors related to the estimation
of personality in a human-machine interaction environment evaluated by external
analysts and assessors. In their work, they analyzed the relationship between vari-
ous traits and the influence of context on trait perception. Finally, they estimated
the credibility of raters to locate a fix for the issues anticipated through subjec-
tive bias and then proposed a weighted model to deal with observer bias which
2340010-22
March 26, 2023 8:43 IJAIT S0218213023400109 page 23
1st Reading
2340010-23
March 28, 2023 17:12 IJAIT S0218213023400109 page 24
S. Mushtaq et al.
not publicly accessible. According to a recent study,82 the works presented in the
literature employ different evaluation criteria so they are not comparable. External
observers typically form perceptions based on tiny slices taken from video samples.5
As a result, deciding which segment of the video from the entire sequence will be
analyzed is a typical issue that must be addressed. So it is still ambiguous if narrow
slice impressions for the prediction task would generalize across the entire video.
Queries like “Can the best slice from a video clip that efficiently recognizes first
impressions are automatically selected? Is it going to remain consistent throughout
the video?” might be a topic for future field research.
4. Comparative Analysis
We have reviewed the work on personality perception from the past ten years and
in this section, we have performed a comparison of that work to find open research
challenges and limitations (discussed in the next section). Here we have discussed
the datasets, features, techniques used, and performance achieved in the state-of-
the-art.
2340010-24
Table 2. Description of publicly available datasets used in personality computing.
March 26, 2023
FERET 1993–1996 Facial 14 051 static images, 1199 Face recognition. First facial standard Contains only static images
(Ref. 117) images. subjects, 365 duplicate database, allows algorithm so does not provide temporal
images, color = 8 bit grey, comparison, advances face dynamics, Focuses only on
resolution = 256 ∗ 384. recognition state-of-the-art. facial features.
AMI Corpus 2005–2006 Audio- 196 meetings, 213 subjects, Dialogue acts, group Attempt to include computer Despite its impressive size
(Ref. 118) visual, 100 hours of meetings, both activity recognition. vision features, contains rich this corpus is substantial
textual. scenario and non-scenario, annotations and cultural mix “specialist”, contextually
overview and close-up (metadata), contains both and compositionally specific,
cameras, audio contains induced and naturally mainly focus only on
close-taking and far-field occurring meetings. linguistic annotations.
microphones.
Mission 2007 Audio- 13 meetings, 52 subjects, Behavior classification. Improves weakness of Limited size, features are
Survival visual. length of recordings between Mission Survival Corpus that extracted from one specific
Corpus II 280 .1400 –340 .0900 , was due to short meeting discourse content which
IJAIT
(Ref. 119) avg length = 310 · 1000 , durations, lack of personality limits its usefulness in
total length = 6 h 470 800 . assessments and meeting general corpus linguistic
quality evaluations, also the research.
2340010-25
content is unscripted.
SSPNET 2010–2012 Audio. 96 news bulletins, 640 speech Personality trait Largest dataset for speech Only focuses on non-verbal
(Refs. 92, clips of 10 sec or less, prediction from speech. based personality perception, features, ratings are provided
120) 1 speaker per clip 322 unique provides a rich collection of by human judges and low
individuals, 307 professional real world data, along agreement between them can
& 309 non-professional personality score metadata is be the main source of error.
speakers. associated with each clip.
ELEA 2011 Audio- 40 meetings of 15 sec each, Emergent leadership Includes both self-reported Limited size, not all meetings
(Ref. 116) visual. including 3 or 4 members, recognition. and perceived personality provide both audio-visual
1st Reading
(Continued)
S0218213023400109
page 25
March 26, 2023
Table 2. (Continued)
8:43
YouTube 2011 Audio- 442 vlogs, length of each vlog Apparent personality Contains rich real-life Limited number of samples,
vlog dataset visual. is 1 min, 208 males & 234 trait analysis. interactions and metadata no interactions, metadata
(Refs. 103, females, each vlog covers only along with personality not properly described (only
121, 122) 1 participant. impressions, is not limited to gender information).
specific context.
SEMAINE 2012 Audio- 959 conversations, 150 Face-to-face interaction Rich in emotion and The number of total
(Ref. 123) visual. subjects, length of each clip of humans with virtual annotation, contains utterances is small, lack of
is 5 min, size = 780 × 580, agents of different metadata, recording quality multi-party conversation.
contains RGB and grey emotional styles. is high, quantity is
images and both frontal and significant, the clips are long
profile view. enough to discern patterns
over time.
ChaLearn 2016 Audio- 10 000 15-sec long videos were Apparent personality Largest, public and labeled Labeled with only Big-Five
First visual. selected from 3000 YouTube recognition. dataset for personality traits, magnifies bias, no
IJAIT
Impression channels, RGB images, analysis, not limited to interaction exists only a
(Ref. 5) 30 fbps, size = 1280 × 720 specific content, pairwise single individual conversing
2340010-26
comparisons to reduce the with a camera.
bias problems.
ChaLearn 2017 Audio- Extension of ChaLearn First Hirability impressions Along with Big-Five traits Imbalanced in ethnicity,
First visual, impression dataset. and Apparent transcriptions, interview existence of different
Impression hirability personality recognition. annotations, gender, perception biases, videos are
v2 (Ref. 124) impressions, ethnicity are present. not from specific recruitment
audio setting so behavior can be
transcript. slightly different from job
interview, raters have no
experience in recruitment.
1st Reading
MULTISIMO 2016–2018 Audio- 23 collaborative sessions, To model the behavior Is rich in modalities, Limited in size, most
Corpus visual, 49 participants, 46 are players of speakers using contains both self and utterances are dyadic,
(Ref. 125) textual. and 3 are facilitators, average multimodal data in perceived annotations along context-specific.
duration = 10 minutes, total multiparty social with metadata.
duration = 4 hrs. interactions.
S0218213023400109
page 26
March 28, 2023
Accuracy No of correct predictions were Simple to compute and comprehend, a single Inappropriate for unbalanced data and
made by the model out of total value sums up the model’s abilities, can be models providing probabilistic value, not
predictions (is classification used for comparing models. consider incorrect predictions.
metrics (CM)).
Confusion Verifies the classification model’s Shows model is confused while making Class probabilities are not provided, not
matrix on a test data set with predefined predictions, indicates model errors as well as possible to compare models and difficult to
actual values. (CM) error types, consider incorrect predictions, check under or overfit conditions as there
works with imbalanced data. is not a single measure provided.
Precision No. of positive category Measures models efficiency, describes the Focus only on positive class, suitable only
predictions that actually belongs model classification’s relevance, determines when focus is limiting false positives
IJAIT
to the positive category. (CM) model performance in an imbalanced dataset accurate and true identification of negative
for the minority class. class is not required, not used for model
2340010-27
comparison.
Recall No. of positive category Measures models efficiency, returns all Focus only on positive class, suitable only
predictions made out of positive relevant results classified accurately by when the focus is limiting false-negative,
examples in the dataset. (CM) model, assesses minority class coverage in an accurate and true identification of negative
imbalanced dataset. class is not required. Not used for model
comparison.
Unweighted It is the unweighted average of Determines the most reliable model’s quality, Complex metrics, gives equal weight to
Average Recall specific category recalls achieved used for comparing models, focus on both each class, not suitable when certain
(UAR) by the model (CM). positive and negative class, mostly deals with classes take precedence over others.
extreme imbalanced datasets.
(Continued)
S0218213023400109
page 27
March 26, 2023
Table 3. (Continued)
8:43
Area under The capability of the classifier to Assesses model’s ability to differentiate Does not consider predicted probabilities.
curve (AUC) distinguish between categories and between classes regardless of the threshold Positive and negative classes are equally
summary of the ROC curve. ROC used, determines the validity of prediction weighted, does not perform well with
curve is the performance of ranking, can be used for model comparison. extreme dataset imbalance.
classification models at
classification thresholds (CM).
Mean Absolute Here variance between actual Determines the dataset’s average error Is not differentiable function,
IJAIT
Error (MAE) values and observed values is magnitude, easy to understand, has the same mathematical evaluation and numerical
averaged (is regression metrics unit as the output, most resilient to outliers. optimization is difficult.
2340010-28
(RM)).
Mean Squared Here the square of the variance Determines the error variance of a dataset, Not resilient to outliers, has squared unit
Error (MSE) between the actual value and differentiable function, easy to analyze as the output.
observed value is averaged (RM). mathematically and optimize numerically
Root mean It is accuracy metrics, calculated Determines standard deviation of errors in a Is prevalent but difficult to comprehend,
squared error by square rooting the mean of all dataset, quick to calculate, has same unit as sensitive to large errors, less resilient to
(RMSE) errors squared (RM). the output, differentiable function. outliers.
1st Reading
Coefficient of Explains how much variability can Determines the proportion of resultant Always starts increasing by introducing
determination be caused by the relationship variation represented by the regression additional features in the dataset, not give
(R2) between one factor and another model, provides baseline model for model error estimation, not determine whether a
related factor. (RM) comparison. model is good or bad.
S0218213023400109
page 28
March 26, 2023 8:43 IJAIT S0218213023400109 page 29
1st Reading
Ref. 58 Selfie image Big-Five Visual LIBSVM with RBF RMSE up to 0.7 (in
dataset features kernel APR), up to 0.5
achieved in (APP)
depending upon
particular trait and
features used.
Ref. 61 New face Objective traits Facial CNN (Baseline Prediction accuracy
attributes (gender, ethnicity, features (AlexNet)), of 60 to 98% is
dataset hair color, makeup Landmark achieved.
and age) and Augmented CNN
subjective traits (LACNN)
(emotion, attractive,
humorous, and
chubby)
Ref. 62 Twitter profile Big-Five Face, Kernel Partial Least RMSE up to 0.37
pictures holistic-level Square Regression (avg) achieved
dataset scene (KPLS) depending upon
descriptors used.
Ref. 64 Image dataset Big-Five Visual Statistical analysis Negative samples are
features, method low up to 0 and
facial positives samples are
expressions high up to 1184
depending upon the
particular trait,
range, and expression
used.
2340010-29
March 28, 2023 17:12 IJAIT S0218213023400109 page 30
S. Mushtaq et al.
Ref. 74 ChaLearn first Big-Five Face, scene Kernel Partial Mean Average
impression Least Square Accuracy
dataset Regression (MAA) = 0.9094.
(KPLS)
2340010-30
March 26, 2023 8:43 IJAIT S0218213023400109 page 31
1st Reading
Table 5. (Continued)
2340010-31
March 26, 2023 8:43 IJAIT S0218213023400109 page 32
1st Reading
S. Mushtaq et al.
Table 6. (Continued)
Ref. 98 SSPNET Big-Five Audio (YAAFE) BIRCH, KNN, LR, Avg accuracy =
SVM 71.76.
2340010-32
March 28, 2023 17:12 IJAIT S0218213023400109 page 33
Ref. 103 YouTube vlog Big-Five LIWC, N-gram SVM (linear, RBF R2 up to 0.31
dataset analysis, kernel), Random achieved
audiovisual forest depending upon
nonverbal cues, particular trait
facial expressions and features used.
Ref. 105 YouTube vlog Big-Five Audio-visual, SVM with different Average
dataset linguistic, kernels (generated F1-score = 0.76.
psycholinguistic, using
emotional SequentialMinimal
features Optimization)
Ref. 106 YouTube vlog Big-Five Audio, video, Various techniques of RMSE up to 0.64,
dataset textual features multivariate R2 up to 37%.
(emotion, regression (Single
linguistic) target, multi-target
stacking, multi-target
stacking controlling,
ensemble of regressor
chains, ensemble of
regressor chains
corrected,
multiple-objective
random forest)
Ref. 107 YouTube vlog Big-Five, mood Verbal, Multi-level classifier Macro-average up
dataset non-verbal to 65.9%, exact
audiovisual accuracy = 27.3%
features, depending upon
proposed facial feature set used.
features
Ref. 108 ChaLearn first Big-Five Audio-visual For visual modality (MAA) =
impression features VGG-face model (for 0.912968.
dataset pretraining), DAN,
DAN + ResNet). For
audio modality, a
linear regressor is
used
2340010-33
March 28, 2023 17:12 IJAIT S0218213023400109 page 34
S. Mushtaq et al.
Table 7. (Continued)
Ref. 109 ChaLearn first Big-Five Audio-visual Deep Residual MAA = 0.910933.
impression features Networks
dataset (ResNet + FC)
Ref. 81 ChaLearn first Big-Five Audio, face, Deep CNN (for Face MAA = 0.913.
impression scene (VGGNet, FER
dataset 2013), for Scene
(VGG-19)), Kernel
ELM
Ref. 112 ChaLearn first Big-Five Face, ambient, CNN (ResNet and MAA = 0.9188.
Impressions audio, VGGish networks),
dataset V2 transcriptions LSTM
(CVPR’17)
Ref. 113 ChaLearn first Big-Five Visual (face, CR-NET (ResNet34), MAA = 0.9188.
impression scene), audio, BELL-LOSS, ETR
dataset V2 text regressor
Ref. 114 ChaLearn first Big-Five Audiovisual, ResNet, CNN-GRU, MAE avg up to
impression speech LSTNet 0.153 (for SIAP) &
dataset V2, transcripts 0.085 (for first
SIAP impression
dataset).
Ref. 115 ChaLearn first Big-Five Visual (face, VGGish, ResNet, Mean accuracy =
impression scene), audio, ELMO, LSTM 91.8%.
dataset V2 text
2340010-34
March 26, 2023 8:43 IJAIT S0218213023400109 page 35
1st Reading
(i) Publicly available Datasets: There are a limited number of datasets which
are publicly available for the task of personality computing and those dataset
does not include a wide variety of challenging factors like ethnic background,
cultural factors, etc. It is noticed from the literature that most of the works
either constructed their own dataset58,59 or adjusted to their requirements
using datasets that were already developed for some other purposes.60,62 The
main obstacle in the case of APP is that various annotators need to rank
or label each and every subject which are present in the dataset to render a
logical and reliable evaluation in order to deal with the methodological issues
linked with the perception process. So building a personality corpus needs a
lot of manpower, time, and resources and this generally restricts the size of
the APP corpus. Another issue is that the labels associated with the available
dataset generally do not become public and recreating their outcomes can be
a big task.
Future direction: To accelerate the advancement in the field of per-
sonality computing primary need is to develop new, large and public datasets
including a lot of diverse characteristics. Datasets that: provide a combined ex-
amination of genuine and perceived personalities; includes psychological states
or contextual settings more realistically, continual forecasts can be beneficial
to the field of personality computing by providing innovations and new di-
rections. Also, if the datasets come from the same environment as the ones
to be utilized (for example, data samples for evaluating job interviews should
come from the recruitment context), then it will be possible to make better
predictions, which will improve the performance.
(ii) Subjective Bias: As in the case of APP multiple annotators need to anno-
tate all subjects present in the dataset and when the different raters’ provide
different ratings to the personality of the same individual it creates uncer-
tainty in the process of APP. Also, there are no correct insights about the
reliability of ratings so it is not apparent how to handle the issue when the
ratings are crowd-sourced. Hence, the low agreement between annotators is
the key problem in APP that directly influences the performance of the task.85
But recent investigations to some extend proposed a compelling method to
manage raters bias by performing pairwise comparison instead of assessing
each subject individually.5,59 Pairwise comparison brings several benefits, in-
cluding removing the need for annotators to set up absolute benchmarks or
scales for labelling subjects present in the dataset, recognizing the intensity of
each sample in the terms of the relational gap from other instances, resulting
in a more accurate ranking, and preventing formerly labelled samples from
biasing potential ratings.
Future direction: To address the issue of subjective bias, numerous char-
acteristics such as age, gender, and ethnicity,111 should be included during
personality analysis, and new methods for dealing with this issue should be
2340010-35
March 26, 2023 8:43 IJAIT S0218213023400109 page 36
1st Reading
S. Mushtaq et al.
2340010-36
March 26, 2023 8:43 IJAIT S0218213023400109 page 37
1st Reading
above or below than median as two classes) for mapping features into person-
ality traits and when Big-Five personality model is used which is a dimensional
representation of personality but these methodologies convert it into categor-
ical one but the categories have not any goal in psychology as Psychology
mainly focuses on comparisons among peoples which we do every day and
which have realistic applications so better is to rank individuals as per their
personality traits.86 Additionally, the results of APR and APP approaches
may be acknowledged only when fulfilling proper certainty measures.
Machine and deep learning techniques: Machine or deep learning
approaches are used to portray characteristics of a subject into personality
traits. The majority of the works as introduced in literature mainly focus on
single task scenarios and have used handcrafted features and machine learn-
ing approaches for personality prediction,56,57,60,92 whereas few current works
focus on utilizing deep learning approaches5 to predict apparent personality
using multimodal data as these techniques bring several advantages like ex-
tract features automatically and these approaches have the ability to analyze
the entire scenario and situational context rather than just a small collection
of predefined features. And also assists superior Spatio-temporal modeling.
Future direction: Future research should focus on technology advance-
ments in order to strengthen the link between features and personality traits
using various methodologies and should aim for greater integration of compu-
tational and humanities disciplines, resulting in psychological and behavioral
methods to personality identification and externalization. Also, progress in
this topic could be aided by the introduction of unique and advanced deep
techniques for personality prediction.
(v) Personality Models: Most of the existing works concentrate only on the Big-
Five personality model with little emphasis on other models. Despite being
one of the most effective models for personality identification, the Big-Five
model has limitations such as being theoretical, overly descriptive, and lacking
references to personality development over time.
Future direction: Future studies should not be limited to the Big Five
approach. Other models must be included in the datasets. Also enhanced
feature sets should be used in personality-related models to provide a better
representation of personality factors across a variety of data modalities.
(vi) Cultural Dependent: Annotators from different cultural backgrounds al-
locate different traits to the personality of the same individual which means
APP is dependent on cultural background.86,96 But in personality computing,
such an impact has never been taken into consideration, that is automatic
systems in general either disregard the issue by not considering the culture
of both subjects and annotators or restrict the research to one culture only
to avoid multicultural effects. Furthermore, most of the existing datasets con-
centrate only on the Big-Five personality model with little emphasis on other
models.
2340010-37
March 26, 2023 8:43 IJAIT S0218213023400109 page 38
1st Reading
S. Mushtaq et al.
6. Research Methodology
The general research methodology steps for recognizing first impressions are dis-
cussed in this section. To address any problem, a comprehensive strategy must be
followed efficiently. Figure 6 depicts the general research methodology used in first
impression recognition. Every study begins after a thorough examination of existing
approaches and then the flaws of existing approaches are analyzed so that they can
be solved and taken into account when proposing a new approach. The following
steps can be used to explain the research methodology (i)–(vii):
(i) The first step is to choose a dataset that is appropriate for achieving research
goals.
(ii) After that, the pre-processing step will be used to remove any redundant or
undesired value from the dataset.
(iii) The third step is to choose data modalities.
(iv) Then from the selected data modalities, features will be extracted.
(v) Then, to map features to personality traits, regression or classification will be
performed using machine/deep learning approaches.
(vi) If a multimodal personality analysis is employed, the results of all modalities
will be combined.
(vii) The final step is to assign predicted personality values to the subjects present
in the dataset.
2340010-38
March 26, 2023 8:43 IJAIT S0218213023400109 page 39
1st Reading
2340010-39
March 28, 2023 17:12 IJAIT S0218213023400109 page 40
S. Mushtaq et al.
raters’ credibility a proper and logical way of annotating personality traits should
be analyzed. Also, technological advancement is needed to identify meaningful con-
nections between features and personality traits appropriately, and to provide tight
reconciliation among human science and personality computing approaches.
Now few current works start utilizing deep learning approaches to predict per-
sonality using multimodal data, these methods have proven to be efficient and
commenced to create an authentic prediction. Deep learning brings several advan-
tages like works well with a large dataset, extract features automatically so does
not need any feature engineering. But still, more work is needed to be done using
deep architectures. In the future to increase the performance of personality pre-
diction systems, we will propose an approach using the multimodal fusion of the
best-suited features along with bias reduction.
References
1. D. C. Funder, Personality, Annual Review of Psychology 52 (2001) 197–221.
2. G. Matthews, I. J. Deary and M. C. Whiteman, Personality Traits, 3rd edn. (Cam-
bridge University Press, 2009).
3. Wikipedia, Personality computing — Wikipedia, the free encyclopedia http://en.
wikipedia.org/w/index.php?title=Personality%20computing&oldid=984709168,
(2022), [Online; accessed 25-September-2022].
4. A. Vinciarelli and G. Mohammadi, A survey of personality computing, IEEE Trans-
actions on Affective Computing 5(3) (2014) 273–291.
5. V. Ponce-López, B. Chen, M. Oliu, C. Corneanu, A. Clapés, I. Guyon, X. Baró,
H. J. Escalante and S. Escalera, ChaLearn LAP 2016: First round challenge on
first impressions — Dataset and results, in Computer Vision — ECCV 2016 Work-
shops, eds. G. Hua and H. Jégou (Springer International Publishing, Cham, 2016),
pp. 400–418.
6. Wikipedia, First impression (psychology) — Wikipedia, the free encyclopedia
http://en.wikipedia.org/w/index.php?title=First%20impression%20(psychology)&
oldid=1109623588, (2022), [Online; accessed 25 September 2022].
7. J. Willis and A. Todorov, First impressions: Making up your mind after a 100-ms
exposure to a face, Psychological Science 17(7) (2006) 592–598, PMID: 16866745.
8. I. J. Deary, The trait approach to personality, in The Cambridge Handbook of Per-
sonality Psychology, eds. P. J. Corr and G. Matthews, Cambridge Handbooks in
Psychology, Vol. 1. (Cambridge University Press, 2009), pp. 89–109.
9. L. R. Goldberg, The structure of phenotypic personality traits, American Psycho-
logist 48 (1993) 26–34.
10. A. K. Woodend, Continuity of cardiovascular care, Can. J. Cardiovasc. Nurs. 15(4)
(2005) 3–4.
11. A. E. Abele and B. Wojciszke, Agency and communion from the perspective of self
versus others, J. Pers. Soc. Psychol. 93 (2007) 751–763.
12. R. R. McCrae, The Five-factor model of personality traits: Consensus and contro-
versy, in The Cambridge Handbook of Personality Psychology, eds. P. J. Corr and
G. Matthews (Cambridge University Press, 2012) pp. 148–161.
13. G. J. Boyle and E. Helmes, Methods of personality assessment, in The Cambridge
Handbook of Personality Psychology, eds. P. J. Corr and G. Matthews (Cambridge
University Press, 2012) pp. 110–126.
2340010-40
March 26, 2023 8:43 IJAIT S0218213023400109 page 41
1st Reading
14. P. T. Costa, Jr. and R. R. McCrae, Domains and facets: Hierarchical personality
assessment using the revised NEO personality inventory, J. Pers. Assess. 64 (1995)
21–50.
15. A contemplated revision of the neo five-factor inventory, Personality and Individual
Differences 36(3) (2004) 587–596.
16. B. Rammstedt and O. P. John, Measuring personality in one minute or less: A
10-item short version of the big five inventory in English and German, Journal of
Research in Personality 41(1) (2007) 203–212.
17. S. D. Gosling, P. J. Rentfrow and W. B. Swann, A very brief measure of the big-five
personality domains, Journal of Research in Personality 37(6) (2003) 504–528.
18. S. Dhelim, N. Aung, M. A. Bouras, H. Ning and E. Cambria, A survey on personality-
aware recommendation systems, Artificial Intelligence Review 55 (2022) 2409–2454.
19. J. Li, M. X. Zhou, H. Yang and G. Mark, Confiding in and listening to virtual
agents: The effect of personality, in Proc. of the 22nd Int. Conf. on Intelligent User
Interfaces (IUI ’17 ) (Association for Computing Machinery, New York, NY, USA,
2017), pp. 275–286.
20. S. T. Völkel, R. Schödel, D. Buschek, C. Stachl, V. Winterhalter, M. Bühner and
H. Hussmann, Developing a personality model for speech-based conversational agents
using the psycholexical approach, in Proc. of the 2020 CHI Conf. on Human Factors
in Computing Systems (CHI ’20) (Association for Computing Machinery, New York,
NY, USA, 2020), pp. 1–14.
21. A. Cichocki, A. P. Kuleshov and J. Dauwels, Future trends for human-AI collabora-
tion: A comprehensive taxonomy of AI/AGI using multiple intelligences and learning
styles, Intell. Neuroscience 2021 (2021).
22. M. A. Salehinezhad, Personality and mental health, in Essential Notes in Psychiatry,
ed. V. Olisah (IntechOpen, Rijeka, 2012)
23. M. Gulhane and T. Sajana, Human behavior prediction and analysis using machine
learning — A review, Turkish Journal of Computer and Mathematics Education
12(5) (2021) 870–876.
24. I. Naim, M. I. Tanveer, D. Gildea and M. E. Hoque, Automated analysis and predic-
tion of job interview performance, IEEE Transactions on Affective Computing 9(2)
(2018) 191–204.
25. M. A. Coleman, Profile analyst: Advanced job candidate matching via automatic
skills linking (2016).
26. L. S. Nguyen and D. Gatica-Perez, Hirability in the wild: Analysis of online conver-
sational video resumes, IEEE Transactions on Multimedia 18(7) (2016) 1422–1437.
27. A. N. Finnerty, S. Muralidhar, L. S. Nguyen, F. Pianesi and D. Gatica-Perez, Stress-
ful first impressions in job interviews, in Proc. of the 18th ACM Int. Conf. on
Multimodal Interaction (ICMI ’16) (Association for Computing Machinery, New
York, NY, USA, 2016), pp. 325–332.
28. R. Mishra, R. Rodriguez and V. Portillo, An AI based talent acquisition and bench-
marking for job (2020).
29. H. Malik, H. Dhillon, R. Goecke and R. Subramanian, Characterizing hirability via
personality and behavior (2020).
30. Y.-S. Su, H.-Y. Suen and K.-E. Hung, Predicting behavioral competencies auto-
matically from facial expressions in real-time video-recorded interviews, Journal of
Real-Time Image Processing 18 (2021) 1011–1021.
31. J. A. Recio-Garcia, G. Jimenez-Diaz, A. A. Sanchez-Ruiz and B. Diaz-Agudo, Per-
sonality aware recommendations to groups, in Proc. of the Third ACM Conf. on
Recommender Systems (RecSys ’09) (Association for Computing Machinery, New
York, NY, USA, 2009), pp. 325–328.
2340010-41
March 28, 2023 22:45 IJAIT S0218213023400109 page 42
S. Mushtaq et al.
2340010-42
March 26, 2023 8:43 IJAIT S0218213023400109 page 43
1st Reading
2340010-43
March 26, 2023 8:43 IJAIT S0218213023400109 page 44
1st Reading
S. Mushtaq et al.
64. J. Zhang and Z. Wei, Statistical analysis of the relationship between visual features
and personality, in 2017 Second Int. Conf. on Mechanical, Control and Computer
Engineering (ICMCCE ) (2017), pp. 232–235.
65. R. Kosti, J. M. Alvarez, A. Recasens and A. Lapedriza, Emotion recognition in
context, in 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)
(IEEE, 2017), pp. 1960–1968.
66. C. Y. Olivola and A. Todorov, Elected in 100 milliseconds: Appearance-based trait
inferences and voting, Journal of Nonverbal Behavior 34 (2010) 83–110.
67. M. Walker, F. Jiang, T. Vetter and S. Sczesny, Universals and cultural differences in
forming personality trait judgments from faces, Social Psychological and Personality
Science 2(6) (2011) 609–617.
68. F. Noroozi, C. A. Corneanu, D. Kamińska, T. Sapiński, S. Escalera and G.
Anbarjafari, Survey on emotional body gesture recognition, IEEE Transactions on
Affective Computing 12(2) (2021) 505–523.
69. A. Kachur, E. Osin, D. Davydov, K. Shutilov and A. Novokshonov, Assessing the
big five personality traits using real-life static facial images, Scientific Reports 10
(2020) 8487.
70. J.-I. Biel, L. Teijeiro-Mosquera and D. Gatica-Perez, FaceTube: Predicting person-
ality from facial expressions of emotion in online conversational video, in Proc. of
the 14th ACM Int. Conf. on Multimodal Interaction (ICMI ’12 ) (Association for
Computing Machinery, New York, NY, USA, 2012), pp. 53–56.
71. O. Aran and D. Gatica-Perez, Cross-domain personality prediction: From video blogs
to small group meetings (2013).
72. O. Çeliktutan and H. Gunes, Continuous prediction of perceived traits and social
dimensions in space and time, in 2014 IEEE Int. Conf. on Image Processing (ICIP )
(IEEE, 2014), pp. 4196–4200.
73. L. Teijeiro-Mosquera, J. Biel, J. Alba-Castro and D. Gatica-Perez, What your face
vlogs about: Expressions of emotion and big-five traits impressions in youtube, IEEE
Transactions on Affective Computing 6 (2015) 193–205.
74. F. Gürpınar, H. Kaya and A. Salah, Combining deep facial and ambient features for
first impression estimation, in Computer Vision — ECCV 2016 Workshops (2016),
pp. 1–14.
75. C. Ventura, D. Masip and A. Lapedriza, Interpreting cnn models for apparent
personality trait regression, in 2017 IEEE Conf. on Computer Vision and Pattern
Recognition Workshops (CVPRW ) (IEEE, 2017), pp. 1705–1713.
76. S. E. Bekhouche, F. Dornaika, A. Ouafi and A. Taleb-Ahmed, Personality traits and
job candidate screening via analyzing facial videos, in 2017 IEEE Conf. on Computer
Vision and Pattern Recognition Workshops (CVPRW ) (IEEE, 2017), pp. 1660–1663.
77. M. A. Moreno-Armendáriz, C. A. D. Martı́nez, H. Calvo and M. Moreno-Sotelo,
Estimation of personality traits from portrait pictures using the five-factor model,
IEEE Access 8 (2020) 201649–201665.
78. H.-Y. Suen, K.-E. Hung and C.-L. Lin, Intelligent video interview agent used to pre-
dict communication skill and perceived personality traits, Human-centric Computing
and Information Sciences 10 (2020) 3.
79. S. Song, S. Jaiswal, E. Sanchez, G. Tzimiropoulos, L. Shen and M. Valstar, Self-
supervised learning of person-specific facial dynamics for automatic personality
recognition, IEEE Transactions on Affective Computing (2021) 1–1.
80. A. Subramaniam, V. Patel, A. Mishra, P. Balasubramanian and A. Mittal, Bi-modal
first impressions recognition using temporally ordered deep audio and stochastic
visual features, in Computer Vision — ECCV 2016 Workshops, eds. G. Hua and
H. Jégou (Springer International Publishing, Cham, 2016), pp. 337–348.
2340010-44
March 26, 2023 8:43 IJAIT S0218213023400109 page 45
1st Reading
81. F. Gürpinar, H. Kaya and A. A. Salah, Multimodal fusion of audio, scene, and
face features for first impression estimation, in 2016 23rd Int. Conf. on Pattern
Recognition (ICPR) (2016), pp. 43–48.
82. O. Çeliktutan and H. Gunes, Automatic prediction of impressions in time and across
varying context: Personality, attractiveness and likeability, IEEE Transactions on
Affective Computing 8(1) (2017) 29–42.
83. H. Bilen, B. Fernando, E. Gavves and A. Vedaldi, Action recognition with dynamic
image networks, IEEE Transactions on Pattern Analysis and Machine Intelligence
40(12) (2018) 2799–2813.
84. G. Mohammadi and A. Vinciarelli, Humans as feature extractors: Combining
prosody and personality perception for improved speaking style recognition, in 2011
IEEE Int. Conf. on Systems, Man, and Cybernetics (IEEE, 2011), pp. 363–366.
85. G. Mohammadi and A. Vinciarelli, Automatic personality perception: Prediction of
trait attribution based on prosodic features extended abstract, in 2015 Int. Conf. on
Affective Computing and Intelligent Interaction (ACII ) (2015), pp. 484–490.
86. G. Mohammadi, A. Origlia, M. Filippone and A. Vinciarelli, From speech to per-
sonality: Mapping voice quality and intonation into personality differences, in Proc.
of the 20th ACM Int. Conf. on Multimedia (MM ’12) (Association for Computing
Machinery, New York, NY, USA, 2012), pp. 789–792.
87. F. Valente, S. Kim and P. Motlicek, Annotation and recognition of personality traits
in spoken conversations from the AMI meetings corpus, in Proc. of Interspeech 2,
(Interspeech, 2012).
88. C.-J. Liu, C.-H. Wu and Y.-H. Chiu, BFI-based speaker personality perception using
acoustic-prosodic features, in 2013 Asia-Pacific Signal and Information Processing
Association Annual Summit and Conf. (APSIPA, 2013), pp. 1–6.
89. S. Jothilakshmi, J. Sangeetha and R. Brindha, Speech based automatic personality
perception using spectral features, International Journal of Speech Technology 20
(2017) 43–50.
90. S. Jothilakshmi and R. Brindha, Speaker trait prediction for automatic personality
perception using frequency domain linear prediction features, in 2016 Int. Conf. on
Wireless Communications, Signal Processing and Networking (WiSPNET ) (2016),
pp. 2129–2132.
91. R. Solera-Ureña, H. Moniz, F. Batista, V. Cabarrão, A. Pompili, R. Astudillo, J.
Campos, A. Paiva and I. Trancoso, A semi-supervised learning approach for acoustic-
prosodic personality perception in under-resourced domains, in Proc. of INTER-
SPEECH 2017 (Stockholm, Sweden, 2017), pp. 929–933.
92. B. Schuller, S. Steidl, A. Batliner, E. Noeth, A. Vinciarelli, F. Burkhardt, R. van
Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi and B. Weiss, The inter-
speech 2012 speaker trait challenge, in 13th Annual Conf. of the International Speech
Communication Association 2012, INTERSPEECH 2012 1, (2012).
93. R. Solera-Ureña, H. Moniz, F. Batista, R. F. Astudillo, J. Campos, A. Paiva and
I. Trancoso, Acoustic-prosodic automatic personality trait assessment for adults and
children, in Advances in Speech and Language Technologies for Iberian Languages,
eds. A. Abad, A. Ortega, A. Teixeira, C. Garcı́a Mateo, C. D. Martı́nez Hinarejos,
F. Perdigão, F. Batista and N. Mamede (Springer International Publishing, Cham,
2016), pp. 192–201.
94. M.-H. Su, C.-H. Wu, K.-Y. Huang, Q.-B. Hong and H.-M. Wang, Personality trait
perception from speech signals using multiresolution analysis and convolutional neu-
ral networks, in 2017 Asia-Pacific Signal and Information Processing Association
Annual Summit and Conference (APSIPA ASC ) (2017), pp. 1532–1536.
2340010-45
March 26, 2023 8:43 IJAIT S0218213023400109 page 46
1st Reading
S. Mushtaq et al.
2340010-46
March 26, 2023 8:43 IJAIT S0218213023400109 page 47
1st Reading
110. J. Joshi, H. Gunes and R. Goecke, Automatic prediction of perceived traits using
visual cues under varied situational context, in 2014 22nd Int. Conf. on Pattern
Recognition (ICPR) (2014), pp. 2855–2860.
111. R. D. P. Principi, C. Palmero, J. C. S. J. Junior and S. Escalera, On the effect of
observed subject biases in apparent personality analysis from audio-visual signals
(2019).
112. S. Aslan and U. Güdükbay, Multimodal video-based apparent personality recognition
using long short-term memory and convolutional neural networks (2019).
113. Y. Li, J. Wan, Q. Miao, S. Escalera, H. Fang, H. Chen, X. Qi and G. Guo, CR-Net: A
deep classification-regression network for multimodal apparent personality analysis,
International Journal of Computer Vision 128 (Dec 2020) 2763–2780.
114. D. Giritlioğlu, B. Mandira, S. F. Yilmaz, C. U. Ertenli, B. F. Akgür, M. Kınıklıoğlu,
A. G. Kurt, E. Mutlu, Ş. C. Gürel and H. Dibeklioğlu, Multimodal analysis
of personality traits on videos of self-presentation and induced behavior, Journal
on Multimodal User Interfaces 15 (Dec 2021) 337–358.
115. S. Aslan, U. Güdükbay and H. Dibeklioğlu, Multimodal assessment of apparent
personality using feature attention and error consistency constraint, Image and Vi-
sion Computing 110 (2021) 104163.
116. D. Sanchez-Cortes, O. Aran, M. S. Mast and D. Gatica-Perez, Identifying emer-
gent leadership in small groups using nonverbal communicative cues, in Int. Conf.
on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal
Interaction (ICMI-MLMI ’10 ) (Association for Computing Machinery, New York,
NY, USA, 2010).
117. P. Phillips, H. Wechsler, J. Huang and P. J. Rauss, The FERET database and
evaluation procedure for face-recognition algorithms, Image and Vision Computing
16(5) (1998) 295–306.
118. J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec,
V. Karaiskos, W. Kraaij, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska,
I. McCowan, W. Post, D. Reidsma and P. Wellner, The ami meeting corpus: A pre-
announcement, in Machine Learning for Multimodal Interaction, eds. S. Renals and
S. Bengio (Springer Berlin Heidelberg, Berlin, Heidelberg, 2006), pp. 28–39.
119. N. Mana, B. Lepri, P. Chippendale, A. Cappelletti, F. Pianesi, P. Svaizer and
M. Zancanaro, Multimodal corpus of multi-party meetings for automatic social be-
havior analysis and personality traits detection, in Proc. of the 2007 Workshop on
Tagging, Mining and Retrieval of Human Related Activity Information (TMR ’07 )
(Association for Computing Machinery, New York, NY, USA, 2007), pp. 9–14.
120. G. Mohammadi, A. Vinciarelli and M. Mortillaro, The voice of personality: Mapping
nonverbal vocal behavior into trait attributions, in Proc. of the 2nd Int. Workshop on
Social Signal Processing (SSPW ’10 ) (Association for Computing Machinery, New
York, NY, USA, 2010), pp. 17–20.
121. J.-I. Biel and D. Gatica-Perez, The YouTube lens: Crowdsourced personality im-
pressions and audiovisual analysis of vlogs, IEEE Transactions on Multimedia 15(1)
(2013) 41–55.
122. J.-I. Biel and D. Gatica-Perez, Vlogcast yourself: Nonverbal behavior and attention
in social media, in Int. Conf. on Multimodal Interfaces and the Workshop on Machine
Learning for Multimodal Interaction (ICMI-MLMI ’10) (Association for Computing
Machinery, New York, NY, USA, 2010).
123. G. McKeown, M. Valstar, R. Cowie, M. Pantic and M. Schroder, The semaine
database: Annotated multimodal records of emotionally colored conversations be-
tween a person and a limited agent, IEEE Transactions on Affective Computing
3(1) (2012) 5–17.
2340010-47
March 26, 2023 8:43 IJAIT S0218213023400109 page 48
1st Reading
S. Mushtaq et al.
2340010-48