Personality
Personality
net/publication/358396484
CITATIONS READS
43 345
3 authors, including:
Ahmed Abbasi
University of Notre Dame
100 PUBLICATIONS 6,584 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ahmed Abbasi on 06 February 2022.
Abstract
Analysts, managers, and policymakers are interested in predictive analytics capable of offering better foresight. It is
generally accepted that in forecasting scenarios involving organizational policies or consumer decision-making,
personal characteristics – including personality – may be an important predictor of downstream outcomes. The
inclusion of personality features in forecasting models has been hindered by the fact that traditional measurement
mechanisms are often infeasible. Text-based personality detection has garnered attention due to the public availability
of digital textual traces. However, the text machine learning space has bifurcated into two branches: feature-based
methods relying on manually crafted human intuition, or deep learning language models that leverage big data and
compute – the main commonality being that neither branch generates accurate personality assessments, thereby
making personality measures infeasible for downstream forecasting applications. In this study, we propose
DeepPerson, a design artifact for text-based personality detection that bridges these two branches by leveraging
concepts from relevant psycholinguistic theories in conjunction with advanced deep learning strategies. DeepPerson
incorporates novel transfer learning and hierarchical attention network methods that employ psychological concepts
and data augmentation in conjunction with person-level linguistic information. We evaluate the utility of the proposed
artifact using an extensive design evaluation on three personality data sets, in comparison with state-of-the-art methods
proposed in academia and industry. DeepPerson is able to improve detection of personality dimensions by 10 to 20
percentage points relative to the best comparison methods. Using case studies in the finance and health domains, we
show that more accurate text-based personality detection can translate into significant improvements in downstream
applications such as forecasting future firm performance or predicting pandemic infection rates. Our findings have
important implications for research at the intersection of design and data science, and practical implications for
managers focused on enabling, producing, or consuming predictive analytics.
Keywords: Personality Text Mining, Predictive Analytics, Deep Learning, Design Science, NLP, Psychometrics
1 Introduction
We live in an era of great socio-economic uncertainty. At the same time, datafication, democratization,
consumerization, and the ubiquity of social media have created a seemingly insatiable appetite for real-time
analysis, insights, forecasts, and scrutiny of organizational policies, decisions, and performance. Across
time zones, industry sectors, and professions, everyone from financial analysts and epidemiologists to
policy makers and think tanks are interested in better insight and foresight. As part of this global sense-
making narrative during turbulent times, the importance of styles and traits has once again come front and
center (Crayne and Medeiros 2020; Guest et al. 2020). Personality traits affect life choices, business
decisions, suitability for certain jobs, health and well-being, protective behaviors, and numerous other
preferences (Goldberg 1990; Majumder et al. 2017; Wang et al. 2019b). This is true for top-level
management at publicly traded companies (Hambrick and Mason 1984; Hambrick 2007), political leaders
of national and state-level governments (Crayne and Medeiros 2020), everyday online consumers
Acknowledgements: This work was funded in part through U.S. NSF grant IIS-2039915 and Oracle for Research
1
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
(Adamopoulos et al. 2018), and employees adopting new technologies (Devaraj et al. 2008) or seeking to
avoid phishing attacks (Parrish et al. 2009). Simply put, automated personality detection can provide rich
predictors that can enhance agility and foresight in an array of downstream predictive analytics applications.
For instance, previous empirical studies have shown that executives’ personality traits influence their
decision-making (Nadkarni and Herrmann 2010; Riaz, Riaz, and Batool 2012) and leadership styles (Judge
et al. 2002; Judge, Piccolo, and Kosalka 2009). These studies underscore the possible relation between
leaders’ personalities and strategic and tactical organizational decision-making – with implications for
financial forecasting of firm policies and performance (Peterson et al. 2003). In human resource contexts,
personality measures could predict a candidate’s suitability for a particular job role and/or teamwork
performance (LePine and Van Dyne 2001). In digital marketing and online personalization settings,
personality can inform product/music recommendations and effectiveness of word-of-mouth (Celli et al.
2013; Farnadi et al. 2013; Adamopoulos et al. 2018). Personality is a type of psychometric dimension –
psychometrics are constructs related to attitudes, traits, and beliefs. In the management, marketing, and
information systems (IS) literature, the Big Five personality traits (Goldberg 1990) have been used to
examine the impact of personality on various outcomes (Devaraj et al. 2008). Like other psychometric
dimensions, one obstacle to larger-scale empirical analysis or predictive modeling using personality is that
traditional measurement methods – namely, surveys or manual coding of text - are often invasive and
infeasible at scale (Peterson et al. 2003; Ahmad et al. 2020a; Hambrick 2007; Crayne and Medeiros 2020).
Given the difficulties in obtaining traditional psychometric data (Hambrick 2007), natural language
processing (NLP) methods may represent an alternative mechanism for measuring personality through user-
generated content (Ahmad et al. 2020a). However, the text machine learning (ML) space has bifurcated
into two branches: feature-based machine learning relying largely on manually crafted human intuition
(Pratama and Sarno 2015; Tadesse et al. 2018), or deep learning language models relying heavily on big
data and compute (Majumder et al. 2017; Yu and Markov 2017). The main commonality between the two
being that neither branch generates accurate personality assessments, thereby making such measures
infeasible for downstream analytics and policy applications. Accordingly, the research objective of this
2
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
study is to develop a design artifact for text-based personality detection that bridges the schism by
leveraging concepts from relevant psycholinguistic theories in conjunction with advanced deep learning
strategies.
Following the design science approach (Gregor and Hevner 2013; Hevner et al. 2004), we use a kernel
theory from psycholinguistics to develop a robust middle-ground framework called DeepPerson that
couples principled, domain-adapted NLP artifacts (i.e., embeddings, encoders, and attention networks) with
state-of-the-art end-to-end deep learning concepts for enhanced predictive power. Design science research
questions typically center on the efficacy of design elements within a proposed artifact (Abbasi and Chen
2008) and how the artifact can “increase some measure of operational utility” (Gregor and Hevner 2013; p.
343). Accordingly, our research questions focus on personality detection capabilities and the downstream
RQ1: Relative to existing NLP methods, how effectively can DeepPerson detect personality
dimensions from user-generated text?
To answer these questions, we performed two sets of evaluation. In the first, we examined the personality
detection capabilities of DeepPerson and comparison methods. Results reveal that our framework allows
markedly more accurate detection of personality factors from text relative to existing methods developed
in academia and industry, including 10% to 30% improvements over IBM Personality Insights (Liu et al.
2016), Google BERT (Devlin et al. 2019), and Facebook’s RoBERTa (Liu et al. 2019). More importantly,
our second evaluation involving two case studies shows that this enhanced performance translates into
personality variables that can significantly improve forecasting capabilities in finance and health contexts.
The main contributions of our work are three-fold. First, we propose a novel framework for measuring
personality from text. Second, as part of our framework, we design novel transfer learning and hierarchical
attention network methods. The proposed self-taught personality detection fine-tuning (SPDFiT) method
can overcome the labeled data bottleneck encountered in most psychometric NLP problems by generating
numerous pseudo-labeled training examples to enhance end-to-end model training. The word-layer-person
3
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
hierarchical attention network (wlpHAN) uses word and concept layer embeddings coupled with person-
level embeddings to capture key personality cues appearing in text. Third, using a two-part evaluation, we
show that more accurate NLP-based personality detection can translate into significant improvements in
downstream predictive analytics applications such as forecasting future firm performance or predicting
pandemic infection rates. Most notably, as we demonstrate in our evaluation, this is not the case for state-
of-the-art methods which are generally incapable of producing meaningful text-based personality measures.
Our work has important implications for IS research – we believe NLP at the intersection of design and
data science represents a critical opportunity to develop novel, impactful artifacts that amalgamate socio-
technical concepts (Abbasi et al. 2016). Furthermore, our work has practical implications for managers
focused on enabling, producing, or consuming analytics in a broad array of contexts where the inclusion of
personality information for key decision or policy-makers may facilitate enhanced insight and foresight.
The remainder of the article is organized as follows. In the ensuing section, we discuss prior work on
personality, describe state-of-the-art NLP methods for personality detection, and introduce key research
gaps. In section 3, we introduce our proposed framework, using a design science approach. Section 4
presents evaluation results for our framework relative to existing NLP methods. Section 5 uses an empirical
case study to demonstrate the downstream value proposition of enhanced personality measurement,
afforded by our proposed design artifact, for two important forecasting problems in the finance and health
domains. The implications of our work, and concluding remarks, appear in Section 6.
2 Related Work
Prior IS research has studied the importance of personality. It has been shown to influence technology
adoption (Devaraj et al. 2008) and impact online word-of-mouth (Adamopoulos et al. 2018). Personality
traits can also impact susceptibility to phishing attacks (Parrish et al. 2009) and influence how users react
to online recommendations (Celli et al. 2013). Majumder et al. (2017) define personality as the combination
of personal behavior, motivation, and thought-patterns. In the field of psychology, the Big Five personality
4
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
traits (often called the five-factor model) have been widely used to characterize individuals’ personalities
Unlike human emotions, individuals’ personalities have been found to be relatively stable over time
(Cobb-Clark and Schurer 2012), generally unaffected by adverse events. In studies focused on senior
executives, personality traits have been found to influence decision-making style (Nadkarni and Herrmann
2010; Riaz et al. 2012). For instance, (Riaz et al. 2012) suggested that extroversion was positively associated
with a spontaneous decision-making style, while openness was related to intuitive decision-making. The
relation between agreeableness or conscientiousness and decision-making style has also been examined
(Nygren and White 2005). Other studies have explored the relationship between personality and rational
decision-making (Hough and Ogilvie 2005). As one example, extroversion has been associated with
effective leadership (Judge et al. 2002) and transformational leadership (Judge, Piccolo, and Kosalka 2009).
Research has also linked the Big Five personality dimensions to downstream implications – (Peterson et al.
2003) conducted one of the first studies that examined the relationship between CEOs’ personality traits
and firm performance using a small sample of personality information elicited from 17 executives.
It is worth noting that research examining causal relations related to personalities and outcomes have,
in certain circumstances, encountered questions related to reverse causality (Hambrick 2007). For instance,
certain types of personalities might be more conducive to being appointed or elected into leadership roles,
or more indicative of the strategic directions that a particular organization wished to take (Hambrick 2007).
While these concerns are well-founded in causal modeling contexts, they do not lessen the potential value
research has carefully delineated between prediction and explanation (Shmueli and Koppius 2011). As our
evaluation results presented in section 5 and Appendix C reveal, personality dimensions are significant and
5
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
powerful predictors of future outcomes with performance/policy implications. For prediction contexts, this
simply means that the underlying mechanisms contributing to their viability as key predictors of future
The bigger limitation for use of personality dimensions in prediction contexts has been the paucity of
available psychometric data (Ahmad et al. 2020a). Traditional survey and manual annotation techniques
are time-consuming and not well-suited for large-scale prediction (Hambrick 2007; Crayne and Medeiros
2020). However, with the growth of online user generated content, there is a wealth of social media, online
reviews, and public health 3.0 content. In the context of personality and leadership, social executives (Wang
et al. 2021) are increasingly communicating with key stakeholders through social media (Heavey et al.
2020). NLP methods applied to such social media text represents a viable approach for measuring
personality dimensions (Back et al. 2010; Tadesse et al. 2018). This research avenue is also consistent with
the perspective espoused by prior IS design science work related to business analytics, which has called for
design artifacts related to text and social media (Chen et al. 2012; Abbasi et al. 2018). In the following
section, we discuss the limitations of current automated NLP efforts related to personality mining from text.
Automated NLP research focusing on text categorization problems can we broadly grouped into two areas:
manual feature engineering approaches and deep learning methods that leverage big data and/or extensive
compute. Although prior work on automated text-based personality detection has focused more on feature-
based techniques, as we discuss below, both categories of methods offer complementary advantages.
Researchers have examined various linguistic features for detecting individuals’ personality traits.
These features were generally coupled with ML classifiers such as multinomial Naive Bayes (MNB), k-
nearest neighbors (KNN), support vector machines (SVM), and gradient boosted trees (Pratama and Sarno
2015; Tadesse et al. 2018). For instance, Gill and Oberlander (2003) observed that individuals with the
openness trait tend to use words related to insight, while those with the neuroticism tendency are more
likely to use concrete and common words when composing messages. The neuroticism trait has also been
associated with usage of words with negative appraisal and affect (Mairesse et al. 2007). Mehl et al. (2006)
6
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
found that men with the conscientiousness trait tended to use more filler words, while the same did not hold
true for females. The syntactic patterns of messages have also been found to contain important personality
cues (Mairesse et al. 2007). Automated feature-based detection methods have attempted to leverage these
manually inferred insights, and related lexicons, as feature-based inputs for machine learning (ML)
classifiers. For example, the Linguistic Inquiry and Word Count (LIWC) and Research Council
Psycholinguistics Database (MRC) lexicons have been used in prior work geared towards automated ML-
based scoring of social media text (Farnadi et al. 2013; Tausczik and Pennebaker 2010; Vinciarelli and
Mohammadi 2014; Adamopoulos et al. 2018). In addition to lexicons, bag-of-word and part-of-speech tag
n-grams have also been used to detect personality traits (Wright and Chin 2014; Pratama and Sarno 2015).
Tadesse et al. (2018) used structured programming for linguistic cue extraction (SPLICE), encompassing
sentiment, readability, and self-evaluation features, to detect individuals’ personalities. The predictive
power of such linguistic features could be bootstrapped by resampling methods such as like synthetic
minority oversampling (SMOTE) (Wang et al. 2019a). Guan et al. (2020) proposed a Personality2Vec
model in which they ran random walks over user content similarity graphs defined using cosine similarity
Recently, deep learning-based methods have been employed to detect individuals’ personality traits
based on their social media posts (Agastya et al. 2019; Ahmad et al. 2020b; Leonardi et al. 2020). In
particular, it was found that deep CNNs outperformed classical machine learning classifiers in personality
detection (Majumder et al. 2017; Yu and Markov 2017; Sun et al. 2018). The main advantages of deep
CNNs are that they can utilize word embeddings to capture richer contextual information appearing in
documents, thereby allowing the models to generate rich abstract representations of documents. For
personality detection, these capabilities have been further enhanced by combining CNNs with attention
networks. For instance, Xue et al. (2018) exploited word-level attention by aggregating the embeddings of
words surrounding a target word, whereas Lynn et al. (2018) applied word- and message-level attention. A
limitation of the use of learned word embeddings coupled with generic attention-based CNNs, GCNs, and
LSTMs in the personality detection space has been their inability to capture linguistic cues manifesting at
7
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
different granularities including person-level characteristics, psychological concepts, syntactic and word-
level patterns.
2.3 Related NLP Methods: Language Models, Transfer Learning, and Attention
In essence, deep learning has shifted the NLP model-building paradigm from manually weighting low-level
linguistic features to automated learning of semantic and syntactic representations. Pre-trained, general-
purpose language models that attempt to learn broad linguistic patterns and relations applicable to an array
of text categorization tasks epitomize this shift. These models leverage the classic concept of transfer
learning – improving classification performance for a target task in a target domain by acquiring prior
classification knowledge from one or more source tasks in corresponding source domains (Pan and Yang
2009; Torrey and Shavlik 2010). Deep learning has taken transfer learning to a new level, allowing larger
models (millions of parameters) trained on larger source data (millions of general-purpose documents).
Examples include universal language models such as ULMFiT (Howard and Ruder 2018), deep
contextualized representations such as ELMo (Peters et al. 2018), and powerful transformers capable of
learning longer sequential patterns, such as BERT (Devlin et al. 2019). ULMFiT uses inductive transfer
learning to fine-tune the learning rates at different layers of a deep recurrent neural network (RNN) for
enhanced NLP classification (Howard and Ruder 2018). ELMo utilizes different levels of abstraction
knowledge captured at various layers of a deep Bi-LSTM to boost performance (Peters et al. 2018).
Similarly, BERT (Devlin et al. 2019) transfers prior knowledge (based on source data) to the bottom layers
of a deep transformer network, and then allows the top layers to be fine-tuned using a small number of
labelled training examples from the target domain and task. Recently, Leonardi et al. (2020) performed
text-based personality detection using the BERT transformer embeddings as input for a basic multi-layer
neural network. A more common domain-adaptation strategy has been to further pre-train BERT models
on task-specific corpora (unsupervised) before fine-tuning on the supervised training data (since the original
model was trained on Wikipedia and BookCorpus). For instance, BioBERT further pre-trained the BERT-
Base model on billions of tokens from PubMed articles (Lee et al. 2020), whereas SciBERT did the same
on over a million computer science and biomedical papers from Semantic Scholar (Beltagy et al. 2019).
8
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
FinBERT is further pre-trained on corporate filings, financial analyst reports, and earnings conference call
transcripts (Huang et al. 2020). In our evaluation section, we also include a BERT model further pre-trained
on data more closely aligned with personality detection (we call this benchmark method PersonaBERT).
Apart from pre-training language models, another transfer learning approach is to fine-tune deep
learning models using data augmentation methods (Lee 2013; Laine and Aila 2016; Xie et al. 2020).
Examples include unsupervised data augmentation (UDA) (Xie et al. 2019) and Self-Ensembling (Laine
and Aila 2016). These methods utilize consistency regularization to avoid disruption from the data
augmentation process. A limitation of pseudo-labeling methods in general has been the quality of data
generated – which often produces noisy signals that offset the predictive power gains (Lee 2013). This issue
can certainly come into play on social media and user-generated text, where data quality is often lower.
A related machine learning advancement of interest to personality detection has been attention
mechanisms. As noted, some prior personality detection methods have used basic one dimensional
attention, such as AttRCNN (Xue et al. 2018), which uses exploited word-level attention by aggregating
the embeddings of words surrounding a target word. The aspect-oriented sentiment analysis literature has
also used one-dimensional aspect attention for words within a phrase surrounding opinion source/target
keywords, including aspect-aware functions (Zhou et al. 2019) such as dot-product, concat, and general
attention. Recognizing that for many tasks, text patterns manifest at the message versus word levels, the
state-of-the-art has been hierarchical attention networks (HAN) and self-attention based extensions such as
hierarchical convolutional attention networks (HCAN) (Gao et al. 2018). Msg-Attn (Lynn et al. 2018)
approach employs word- and message-level attention for personality detection. However, personality is a
person-centric trait manifesting collectively in terms of the psychological concepts conveyed (Goldberg
1990; Cobb-Clark and Schurer 2012). Existing attention mechanisms ignore key person-level information
and the organic concept construct, instead focusing on the more arbitrary “message” unit of information).
The performance of existing machine-learning-based automated personality detection methods has been
inadequate. Gjurković et al. (2021) observed that feature-based text classification methods’ predictions
9
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
often had correlation rates of under 0.2 with gold-standard Big Five traits. Accuracies for industry-leading
personality detectors such as IBM personality insights have been observed to be equally low (Jayaratne and
Jayatilleke 2020). Similarly, a recent survey found that deep learning-based methods attained mean
accuracies of 58-63% when detecting Big Five traits from text (Mehta et al. 2020). They acknowledge this
poor performance as a bottleneck for downstream use and utility of automated detection methods (Mehta
et al. 2020; p. 2333-2334), noting “If an individual’s personality could be predicted with a little more
reliability, there is scope for integrating personality detection in almost all agents dealing with human-
We believe the issue is one of representational richness – effective personality detection necessitates
machine learning with enhanced expressive power. There is a need to include rich psychological concepts,
methods to capture patterns at different granularities, and techniques for overcoming limitations in available
psychological/directional training data for individuals. In order to illustrate this limitation in the state-of-
the-art, Table 1 summarizes existing methods covered in sections 2.2-2.3 in terms of four important
dimensions: the type of method, the language representations, use of attention mechanisms, and transfer
learning. In some respects, existing methods are limited by the Goldilocks principle – each type of method
generally does well on one of these dimensions, resulting in a smorgasbord of opportunities and limitations.
Feature-based methods use rich, domain-specific lexicons, but are limited in the extensiveness of patterns
learned due to reliance on feature-based machine learning classifiers. Deep learning personality detectors
use more robust sequential, spatial, and convolutional representational learning, even incorporating basic
attention, but lack inclusion of rich psychological concepts, multi-level attention, person-centric patterns,
or transfer learning. Language models use powerful self-attention, but do not consider patterns at different
granularities and are designed for standard word tokens. Relevant hierarchical/aspect attention use general
word embeddings, do not go beyond word-sentence-message level attention, and have typically not been
used in conjunction with transfer learning. Similarly, relevant transfer learning methods have their
limitations, namely learning from noisy data such as user-generated social media (Lee 2013). However,
integrating psycho-linguistic concepts, state-of-the art deep learning artifacts for multi-granularity
10
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
patterns/attention, and personality-appropriate transfer learning is non-trivial. As one example, even IBM
moved away from LIWC in recent years towards GloVe word embeddings (Jayaratne and Jayatilleke 2020,
p.115347), noting “Earlier versions of the service used the LIWC psycholinguistic dictionary with its
machine-learning model. However, the open-vocabulary approach outperforms the LIWC-based model.”
Table 1 Strengths and Limitations of Prior Personality Detection and Related NLP Methods
The IS discipline has a rich history of design research utilizing concepts from language,
communication, and psychology (Woo 2001; Lytinen 1985), including machine learning work geared
towards NLP artifacts (Abbasi and Chen 2008; Abbasi et al. 2018; Li et al. 2020). There is no question that
recent advancements in deep learning, namely language models driven by transformers (Devlin et al. 2019),
have disrupted NLP design research. In essence, the domain-adapted feature engineering paradigm that was
pervasive for many years in text categorization studies – where researchers developed and applied carefully
constructed knowledge bases and lexical thesauri – has seemingly been rendered extinct by models capable
of employing millions, even billions of parameters tuned on massive text corpora (Brown et al. 2020).
11
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
However, we believe this demise has been grossly exaggerated. From a design science perspective, if we
define the effectiveness of an artifact based on its level of operational utility (Gregor and Hevner 2013),
neither existing feature and deep learning personality detectors or general-purpose language models are
well suited for text-based personality detection. As we later demonstrate, existing NLP methods in both
branches fail to produce personality measures that can improve downstream prediction outcomes. In fact,
we evaluate and markedly outperform every bolded method presented in Table 1. NLP artifacts are
inherently socio-technical, and opportunities for human-centered machine learning persist (Abbasi et al.
2016). There is a need to couple the power of state-of-the-art machine learning NLP methods with
principled, theory-driven domain adaptation. This is precisely the research gap we aim to address with our
proposed framework.
Many prior design science studies have used kernel theories to guide the design of novel artifacts (Li et al.
2020). According to Walls et al. (1992), kernel theories are derived from the natural and social sciences
and are used to govern meta-requirements. Arazy et al. (2010) stated that theories from those domains are
rarely used as-is because their scope and granularity are often inadequate for a specific design problem. As
noted, a fundamental problem with the state-of-the-art for NLP-based personality detection is a lack of
representational richness. Existing manual feature engineering approaches lack the breadth of patterns
needed to effectively capture personality traces from text, whereas the deep learning-based language models
are better suited for learning general NLP patterns, but lack contextualization. By focusing on the meta-
functions of language, Systemic Functional Linguistic Theory (SFLT) provides a theoretical lens for how
to think about representational richness in language (Halliday 2004). SFLT, which has been used in prior
IS design work (Abbasi and Chen, 2008), argues that language encompasses three core meta-functions
(Halliday 2004): ideational, interpersonal, and textual. The ideational meta-function stems from the notion
that language provides a mechanism for describing “human experience,” including experiential and logical
ideas and concepts (Halliday 2004; p. 29). The interpersonal meta-function relates to “enacting our personal
12
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
and social relationships” – it is both interactive and personal. The textual meta-function focuses on “the
Table 2 shows how we use SFLT as a kernel theory to guide the design of DeepPerson, our middle-
ground framework that combines problem domain adapted design with advanced machine learning
techniques. Our main design intuition is that enhancing text-based personality detection necessitates
effective representation of the ideational, interpersonal, and textual meta-functions of language as they
relate to personality trait traces appearing in natural language. The middle-ground domain adaptation
13
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
attention network (wlpHAN) that includes word, a broader text layer for syntax/semantics/concepts, and
person-level information, and our novel transfer learning method for learning robust personality traces.
Building on the design guidelines in Table 2, Figure 1 shows an overview of the proposed DeepPerson
framework, which includes three main components: CNN-LSTM, wlpHAN, and transfer learning via
SPDFiT. The CNN-LSTM network consists of a convolutional neural network (CNN)-based character
encoder and two multi-layer Bi-LSTM networks. The first Bi-LSTM takes the character encoder and word
CNN embeddings as input. This component is intended to capture language usage related to the logical
ideational (Word CNN) and textual (character encoder) meta-functions (Kim et al. 2016; Mairesse et al.
2007). The second Bi-LSTM incorporates the psychological concept encoder to capture personality traces
wlpHAN uses word and layer-level attention to capture personality cues appearing at various linguistic
granularities for better representation of the ideational meta-function of language. Moreover, since
14
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
personality traits are speaker-level constructs, wlpHAN also employs a person-level embedding for
measuring an individual’s cues across documents in order to better capture person-specific facets of the
Finally, since rich psychometric dimensions such as personality traits entail careful examination of
context, semantics, lexicogrammar, and expression (Halliday 2004), limited training data can pose as a
bottleneck (Chen et al. 2018). Accordingly, we propose Self-taught Personality Detection Fine-tuning
(SPDFiT), a novel inductive transfer learning method that uses a domain adapted pseudo-labeling data
augmentation technique to expand available training data by employing massive unlabeled domain-specific
data to fine-tune the wlpHAN component. In other words, SPDFiT enables the transfer of domain-specific
knowledge from similar source problem domains to enhance the target task of personality detection.
Before delving into the detailed formulations and intuition behind CNN-LSTM, wlpHAN, and SPDFiT,
we present an example to illustrate the enhanced representational richness afforded by these key
components of DeepPerson. The wlpHAN component is able to weight syntactic and semantic elements
input by the CNN-LSTM at different layers of the attention network, as shown in Figure 2. The illustration
depicts the highly weighted elements for detecting the “extroversion” (EXT) and “unconscientiousness”
(UNCON) personality dimensions, from two tweets respectively, for the former U.S. president. An
individual with the “extroversion” personality trait tends to be attention-seeking, sociable, and playful –
while the “unconscientiousness” personality trait is often associated with being reckless and impulsive
(Goldberg 1990). By using wlpHAN (e.g., word and layer-level attention coupled with the personal
embeddings) in conjunction with the CNN-LSTM, our proposed framework can correctly detect these (and
other) personality trait “digital traces” manifesting in documents based on word usage, syntax/semantic
(synsem) usage, and psychological concepts (e.g., self-focus, positive emotion, affect, and social process).
While SPDFiT is not explicitly depicted in the example, it has a moderating effect on the accuracy and
quality of patterns derived. We later empirically demonstrate the predictive power of each component via
aggregate level ablation analysis and instance-level error analysis, including how the concept and syntactic-
15
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
We utilize a Bi-LSTM network known as “embeddings from Language Models” (ELMO), which has been
successfully applied to NLP tasks (Peters et al. 2018). Each term 𝑡𝑡 of a sentence is first fed into the CNN-
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸
based character encoder to produce the corresponding encoding 𝑥𝑥𝑡𝑡 . The encoded term sequences
are then input into the first multi-layer Bi-LSTM network that captures implicit syntactic patterns embedded
⃖����������������
in documents. Each Bi-LSTM cell produces two hidden outputs, namely ℎ
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 ����������������⃗
and ℎ
𝑆𝑆𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦
. In
𝑡𝑡.𝑙𝑙 𝑡𝑡,𝑙𝑙
⃖����������������
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 ����������������⃗
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
particular, ℎ𝑡𝑡.𝑙𝑙 represents the hidden output of term 𝑡𝑡 at the 𝑙𝑙th layer, and ℎ 𝑡𝑡,𝑙𝑙 represents the
hidden output 𝑡𝑡 for the opposite direction. Hence, the aggregated output of the multi-layer Bi-LSTM
network is as follows.
𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒 ⃖����������������
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 ����������������⃗
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝑺𝑺𝑺𝑺𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏
(1)
𝑯𝑯𝒕𝒕 = {𝒙𝒙𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬
𝒕𝒕 , 𝒉𝒉𝒕𝒕,𝒍𝒍 , 𝒉𝒉𝒕𝒕,𝒍𝒍 |𝒍𝒍 = 𝟏𝟏, . . . , 𝑳𝑳𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 } = {𝒉𝒉𝒕𝒕,𝒍𝒍 |𝒍𝒍 = 𝟎𝟎, . . . , 𝑳𝑳𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 }
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
where ℎ𝑡𝑡,0 = 𝑥𝑥𝑡𝑡 is held when 𝑙𝑙 = 0 is true, and ℎ𝑡𝑡,𝑙𝑙 represents the combination of
⃖����������������
ℎ
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 ����������������⃗
and ℎ
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
of each hidden layer. The size of the output vector of the Bi-LSTM network is 1024.
𝑡𝑡.𝑙𝑙 𝑡𝑡,𝑙𝑙
16
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
As noted, psychological concepts are an important aspect of the experiential aspect of the ideational
meta-function in the context of personality detection (Pennebaker and King 1999). Accordingly, we propose
a psychological concept embedding to enhance representational richness for personality detection. The
psychological concepts pertaining to each term are identified by using existing psycholinguistic resources
(e.g., LIWC, MRC, and SPLICE). This mapping from word/tokens to psychological concepts is a critical
mechanism for enabling domain-adapted learning that leverages human knowledge and expertise in
conjunction with robust algorithms. As shown in Figure 1, a concept embedding is produced via the
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
psychological concept encoder powered by existing psycholinguistic resources. Let 𝑥𝑥𝑡𝑡 denote the
concept embedding of a term 𝑡𝑡. The second multi-layer Bi-LSTM network is designed to capture the
sequential relationships among concepts expressed in a document, with the output denoted as:
�����������������⃗
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
of each hidden layer, and 𝐿𝐿𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 is the number of layers of the Bi-LSTM network. The output
ℎ𝑡𝑡,𝑙𝑙
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
dimension of 𝐻𝐻𝑡𝑡 is the same as that of 𝐻𝐻𝑡𝑡 . Finally, the two Bi-LSTM networks are aggregated:
𝑯𝑯𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪
𝒕𝒕 = {𝒉𝒉𝒕𝒕,𝒍𝒍 |𝒍𝒍 = 𝟎𝟎, . . . , 𝑳𝑳𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 }, 𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 𝑳𝑳𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 = 𝑳𝑳𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 + 𝑳𝑳𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 (3)
Although the CNN-LSTM network can generate rich syntactic and semantic representations, previous work
in social psychology has shown that individuals’ psychological states are related to their personalities
(Pennebaker and King 1999), and traces of these can appear at different granularities within text. Attention
mechanisms can help capture personality cues appearing at various linguistic levels for better representation
of such psychological state information related to the ideational meta-function of language - which can
manifest at the word, phrase, clause, sentence, and cross-sentence levels. However, existing attention
networks mainly deal with word-based or sentence-based attention (Yang et al. 2016; Gao et al. 2018; Jing
2019). Accordingly, our proposed wlpHAN employs attention at the word and layer levels, as well as a
17
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
personal embedding to capture speaker level linguistic cues associated with personality traits (which are
part of the inter-personal meta-function from an SFLT perspective). As we later demonstrate empirically,
the inclusion of layer and person-level attention enhances personality detection capabilities.
The architectural design of the proposed attention network is outlined in Figure 3. The output from each
layer of the multi-layer Bi-LSTMs in the CNN-LSTM network is input to the wlpHAN, which infers
appropriate weights for various psycholinguistic elements appearing in different granularities within
documents. Let 𝑇𝑇 denote the set of terms of a document 𝑚𝑚. For each term 𝑡𝑡 ∈ 𝑇𝑇, an annotation set 𝐻𝐻𝑡𝑡 is
generated by each Bi-LSTM network according to Equation 3, including both multilayer concept
embeddings and multilayer syntactic and semantic embeddings. Let ℎ𝑡𝑡,𝑙𝑙 be the hidden output corresponding
to term 𝑡𝑡 input into the 𝑙𝑙th layer of the attention network. Similar to the approach proposed by (Yang et al.
2016), our attention network assigns a higher weight to a layer if ℎ𝑡𝑡,𝑙𝑙 is similar to the context vector 𝑢𝑢𝑤𝑤
measured by the inner product of these vectors, whereas 𝑢𝑢𝑤𝑤 is randomly initialized. A Sigmoid function is
then applied to normalize the weights inferred by the attention network. Let 𝛼𝛼𝑡𝑡,𝑙𝑙 represent the derived
attention score for term 𝑡𝑡 at the 𝑙𝑙th layer of the attention network. The annotation 𝑟𝑟𝑡𝑡 of term 𝑡𝑡 is the
18
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
𝒆𝒆𝒆𝒆𝒆𝒆(𝒉𝒉⊤
𝒕𝒕,𝒍𝒍 𝒖𝒖𝒘𝒘 )
𝒓𝒓𝒕𝒕 = ∑𝒍𝒍 𝜶𝜶𝒕𝒕,𝒍𝒍 𝒉𝒉𝒕𝒕,𝒍𝒍 , 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮 𝜶𝜶𝒕𝒕,𝒍𝒍 = ∑ ⊤ (4)
𝒍𝒍′ 𝒆𝒆𝒙𝒙𝒙𝒙(𝒉𝒉𝒕𝒕,𝒍𝒍′ 𝒖𝒖𝒘𝒘 )
Given the term annotation 𝑟𝑟𝑡𝑡 , a single Bi-LSTM layer is invoked to incorporate the contextual
information of a document into the word-level representation. For each term 𝑡𝑡, the corresponding hidden
output generated by the Bi-LSTM layer of the attention network is denoted ℎ𝑡𝑡 , and it is defined as the
⃖����𝒕𝒕 , ����⃗
𝒉𝒉𝒕𝒕 = {𝒉𝒉 ⃖������������(𝒓𝒓𝒕𝒕 ), ������������⃗
𝒉𝒉𝒕𝒕 } = {𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳 𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳(𝒓𝒓𝒕𝒕 )} (5)
The attention mechanism applied to the word-level is similar to that applied to the layer-level. The
word-level input ℎ𝑡𝑡 is first fed into a fully-connected layer to derive the partial document representation 𝑑𝑑𝑡𝑡
for each term 𝑡𝑡. Then, a context vector 𝑢𝑢𝑑𝑑 is constructed, and its similarity with ℎ𝑡𝑡 is measured in terms of
the inner product of the corresponding vectors. A Sigmoid function is then applied to normalize the weights
inferred by the word-level attention mechanism. Let 𝛼𝛼𝑡𝑡 denote the overall attention score for term 𝑡𝑡. The
final document representation 𝑑𝑑𝑚𝑚 is derived by summing the weight of each term-based partial document
representation 𝑑𝑑𝑡𝑡 .
𝒆𝒆𝒆𝒆𝒆𝒆(𝒅𝒅⊤ 𝒕𝒕 𝒖𝒖𝒅𝒅 )
𝒅𝒅𝒎𝒎 = ∑𝒕𝒕 𝜶𝜶𝒕𝒕 𝒅𝒅𝒕𝒕 , 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈 𝒅𝒅𝒕𝒕 = 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕(𝑾𝑾𝒅𝒅 𝒉𝒉𝒕𝒕 + 𝒃𝒃𝒅𝒅 ) 𝒂𝒂𝒂𝒂𝒂𝒂 𝜶𝜶𝒕𝒕 = ∑ ⊤ (6)
𝒕𝒕′ 𝒆𝒆 𝒙𝒙𝒙𝒙(𝒅𝒅 𝒖𝒖 )
𝒕𝒕′ 𝒅𝒅
To account for person-level contextual factors, the document representation 𝑑𝑑𝑚𝑚 is passed into a single-
layer Bi-LSTM network that acts as a person-level encoder (Feng et al. 2019). The associated personal
embeddings are especially important since social media posts are often short and devoid of sufficient
broader text cues related to personality traits. Our person-level context-aware representation is as follows:
against the Big-five personality categories 𝐶𝐶 = {𝐸𝐸𝐸𝐸𝐸𝐸, 𝑁𝑁𝑁𝑁𝑁𝑁, 𝐴𝐴𝐴𝐴𝐴𝐴, 𝐶𝐶𝐶𝐶𝐶𝐶, 𝑂𝑂𝑂𝑂𝑂𝑂}. Let 𝐷𝐷 denote the set of
individual with a personality trait 𝑐𝑐 is inferred according to Equation 8. Moreover, the individual’s
19
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
Equation 9. To train the proposed hierarchical attention-based deep learning model, we adopt the common
cross entropy loss function (Majumder et al. 2017). Further model details appear in Appendix A.
(𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷)
𝒆𝒆𝒆𝒆𝒆𝒆(𝑾𝑾𝒎𝒎𝒎𝒎 𝒅𝒅𝒎𝒎 +𝒃𝒃𝒎𝒎 )
𝒑𝒑(𝒄𝒄|𝒎𝒎, 𝜽𝜽) = (𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷) (8)
∑𝒄𝒄∈𝑪𝑪 𝒆𝒆𝒙𝒙𝒙𝒙(𝑾𝑾𝒎𝒎𝒎𝒎 𝒅𝒅𝒎𝒎 +𝒃𝒃𝒎𝒎 )
∑𝒎𝒎∈𝑫𝑫 𝒑𝒑(𝒄𝒄|𝒎𝒎,𝜽𝜽)
∀𝒄𝒄 ∈ 𝑪𝑪: 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑[𝒄𝒄] = |𝑫𝑫|
(9)
Effectively training supervised deep learning models usually entails use of a large number of labeled
training examples (Chen et al. 2018). While the first two components of DeepPerson are designed to provide
powerful personality detection capabilities, the paucity of available labeled data for pyschometric NLP
tasks such as personality detection can be a major impediment (Ahmad et al. 2020a; Hambrick 2007). From
an SFLT perspective (Halliday et al. 2004), learning is difficult if there isn’t enough contextual, semantic,
expression, and lexicogrammar content to sequence over (i.e., for the CNN-LSTM) and pay attention to
(e.g., for the wlpHAN). State-of-the-art NLP language models such as ULMFiT (Howard and Ruder 2018),
ELMo (Peters et al. 2018), and BERT (Devlin et al. 2019) bolster the amount of data upon which sequence
and attention weights can be learned by utilizing inductive transfer learning to pre-train deep neural
networks. While these methods work well for a breadth of NLP problems, their propensity to adapt to a
specific domain or task (e.g., psychometric NLP) is constrained by the availability of labeled training
examples necessary to fine-tune the models. To alleviate this problem, we design a novel inductive transfer
learning method named self-taught personality detection fine-tuning (SPDFiT) for generating pseudo-
labeled training examples to enhance the fine-tuning of the first two components of DeepPerson.
The basic intuition behind SPDFiT is as follows. First, it utilizes existing psycholinguistic resources to
(𝑢𝑢)
derive a good representation 𝑡𝑡⃗ for each unlabeled document 𝑑𝑑𝑚𝑚 ∈ 𝐷𝐷 (𝑢𝑢) , where 𝐷𝐷 (𝑢𝑢) is an unlabeled
domain-specific corpus. Second, it estimates the prior probability 𝑝𝑝(𝑡𝑡⃗|𝑐𝑐) based on a small number of
(𝑙𝑙)
labeled training examples 𝑑𝑑𝑛𝑛 ∈ 𝐷𝐷 (𝑙𝑙) . Third, the posterior probability 𝑝𝑝(𝑐𝑐|𝑡𝑡⃗) (i.e., a pseudo-label) is
derived using Bayes theorem. Fourth, a novel entropy-based measure 𝑠𝑠𝑚𝑚 ∈ [0,1] is applied to assess the
20
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
quality of each pseudo-labeled training example. Finally, pseudo-labeled examples are selected for model
fine-tuning with selection probabilities proportional to their quality measure 𝑠𝑠𝑚𝑚 . This measure is also used
to dynamically adjust the learning rate of the Stochastic Gradient Descent (SGD) process to ensure that the
model can incorporate the quantity (of data) and quality (of labeling) tradeoff as part of its learning.
At a high level, SPDFiT works with the CNN-LSTM and wlpHAN components within the DeepPerson
framework as follows: (1) a large unlabeled data set from a similar source NLP domain (e.g., the 1B Word
benchmark collection (Chelba et al. 2013)) is used to pre-train the CNN-LSTM network; (2) SPDFiT is
used to generate pseudo-labeled examples from a large unlabeled social media corpus (i.e., the Go et al.
(2009) Sentiment140 corpus) for initial fine-tuning of the whole model, (3) we apply a small number of
labeled training examples from the training set to further fine-tune the model. While state-of-the-art
inductive transfer methods such as ULMFiT (Howard and Ruder 2018), ELMo (Peters et al. 2018), and
BERT (Devlin et al. 2019) include steps (1) and (3) above for model pre-training and fine-tuning, these
methods do not used pseudo-labeling (step 2). Conceptually, this is a critical domain-adaption bridge
between powerful (generic) universal language modeling and task-specific contextualization using seed
manually labeled data rich in human insight. As we later demonstrate empirically, this step allows SPDFiT
The detailed formulations are as follows. The CNN-LSTM network is first pre-trained on the 1B Word
benchmark collection (Chelba et al. 2013). CNN-LSTM generates two term-based probability distributions:
the forward distribution 𝑝𝑝(𝑤𝑤𝑡𝑡 |𝑤𝑤1 , 𝑤𝑤2 , . . . , 𝑤𝑤𝑡𝑡−1 ) and the backward distribution 𝑝𝑝(𝑤𝑤𝑡𝑡 |𝑤𝑤𝑡𝑡+1 , . . . , 𝑤𝑤|𝑇𝑇| ), where
𝑤𝑤𝑡𝑡 is a term weight. For each document, we jointly maximize the likelihood of the forward and the
|𝑻𝑻|
𝜣𝜣𝒏𝒏𝒏𝒏𝒏𝒏 = 𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂(∑𝒕𝒕=𝟏𝟏( log𝒑𝒑(𝒘𝒘𝒕𝒕 |𝒘𝒘𝟏𝟏 , 𝒘𝒘𝟐𝟐 , . . . , 𝒘𝒘𝒕𝒕−𝟏𝟏 ; 𝜣𝜣𝒐𝒐𝒐𝒐𝒐𝒐 ) + 𝒑𝒑(𝒘𝒘𝒕𝒕 |𝒘𝒘𝒕𝒕+𝟏𝟏 , . . . , 𝒘𝒘|𝑻𝑻| ; 𝜣𝜣𝒐𝒐𝒐𝒐𝒐𝒐 ))) (10)
The computational details of the SPDFiT method are shown in Algorithm 1. It first utilizes existing
psycholinguistic resources (e.g., LIWC, MRC, and SPLICE) to extract discriminative features (e.g.,
psychological features) from a large unlabeled social media data set (i.e., line 3 of Algorithm 1). Meanwhile,
21
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
the model parameters of the Gaussian distribution are approximated through Gibbs sampling (i.e., line 6 of
Algorithm 1). Then, the proposed algorithm computes the priori probability 𝑝𝑝(𝑡𝑡⃗|𝑐𝑐) according to the
estimated Gaussian distribution (i.e., line 7 of Algorithm 1). For the unlabeled social media data set, the
proposed algorithm infers the probability distribution of personality categories according to the Bayes
theorem (i.e., line 11 of Algorithm 1). In particular, each unlabeled training example is assigned the
personality category with the highest probability (i.e., pseudo-labeling) in line 12 of Algorithm 1. SPDFiT
employs Bayesian learning since it is a solid decision theoretic framework that offers an intuitive and
principled way of combining prior evidence (e.g., psycholinguistic patterns) to infer the most probable
outcomes (pseudo-labels) (Haussler et al. 1994), and has been used effectively in prior deep learning
contexts involving limited labeled data (Gal et al. 2017). As we demonstrate empirically in our ensuing
22
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
Quality is always an important consideration with semi-supervised and unsupervised approaches such
as pseudo-labeling (Lee 2013). Based on the maximum likelihood assumption, pseudo-labeled training
examples with relatively large probabilities with respect to a certain class are more likely to be assigned the
correct class labels. Accordingly, we use an information theoretic metric (𝑠𝑠𝑚𝑚 ∈ [0,1]) to estimate the quality
of pseudo-labeled training examples (i.e., line 13 of Algorithm 1). In information theory, “entropy” denoted
|𝑆𝑆|
𝐻𝐻(𝑆𝑆) = ∑𝑖𝑖=1 − 𝑝𝑝𝑖𝑖 log2 𝑝𝑝𝑖𝑖 has been widely used to measure the uncertainty of a system 𝑆𝑆 , where a
probability distribution 𝜑𝜑 is often used to characterize various states 𝑖𝑖 of the system 𝑆𝑆. Given the class
distributions of pseudo-labeled training examples (i.e., 𝜑𝜑𝑚𝑚 ), the instances with relatively low entropy (i.e.,
low uncertainty or high quality) are more likely to be selected for fine-tuning the proposed deep learning
model. Let 𝜑𝜑𝑚𝑚𝑚𝑚𝑚𝑚 denote the most uncertain pseudo-labeling (i.e., an even probability distribution) of any
unlabeled examples and 𝜑𝜑𝑚𝑚 denote the probability distribution of pseudo-labeling for an arbitrary
unlabeled example 𝑚𝑚. The proposed information theoretic metric for estimating the certainty (quality) of
𝐻𝐻(𝜑𝜑𝑚𝑚𝑚𝑚𝑚𝑚 )−𝐻𝐻(𝜑𝜑𝑚𝑚 )
pseudo-labeled training examples is defined as follows: 𝑠𝑠𝑚𝑚 = 𝐻𝐻(𝜑𝜑𝑚𝑚𝑚𝑚𝑚𝑚 )
. Further, this quality metric
is also used to control the learning rate of the SGD process during model fine-tuning (i.e., lines 17-18 of
Algorithm 1). Hence, the pseudo-labeled training examples with relatively high certainty scores will trigger
higher learning rates in the SGD process, and thereby exert greater influence during model fine-tuning.
4 Design Evaluation
Following the design science approach, we evaluate the operational utility of our proposed artifact in two
ways (Gregor and Hevner 2013). First, we use a design evaluation to show that the DeepPerson framework,
grounded in SFLT, outperforms existing feature and deep learning methods for text-based detection of
personality dimensions. As part of this evaluation, we also show that this performance lift is attributable to
the effectiveness of its key components, namely, wlpHAN and SPDFiT. Our second evaluation uses
empirical case studies to demonstrate the downstream implications of these performance deltas. We show
that text personality variables developed using DeepPerson can significantly improve forecasting in
23
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
financial and health contexts where executive decision-making can shape outcomes. Our design evaluation
is discussed in the remainder of this section (Section 4) while one of the case studies appears in Section 5.
To evaluate the design of DeepPerson, we used three well-known benchmark collections, namely
PANDORA (Gjurković et al. 2021), myPersonality (Celli et al. 2013) and the Essays data set (Mairesse et
al. 2007). PANDORA is a large-scale collection of 3,000,566 Reddit comments from 1,568 users and their
corresponding personality traits elicited using surveys involving the same Big-five constructs (Goldberg
1990). The myPersonality data set contains 10,000 status updates contributed by 250 Facebook users (Celli
et al. 2013), and their accompanying Big Five personality survey results. In contrast, the Essays corpus
contains 2,479 essays that capture a total of 1.9 million words composed by 2,479 psychology students
(Mairesse et al. 2007). Similarly, students’ personality traits were elicited by using questionnaires that
incorporated the Big-five constructs. Table 3 depicts basic descriptive statistics for each of the data sets.
used in prior personality detection studies, as well as state of the art universal language models (all
previously discussed in Table 1). Feature-based methods included KNN coupled with LIWC categories
(Farnadi et al. 2013), SVM using word n-grams (Wright and Chin 2014), gradient boosted trees (Tadesse
et al. 2018), and the synthetic minority over-sampling and Tomek Link (SMOTETomek) personality
detector (Wang et al. 2019a). As noted in our discussion of related work, such LIWC and n-gram-based
24
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
features input into classical machine learning methods have been used extensively for personality detection
(Iacobelli et al. 2011). Our deep learning-based benchmark personality detectors included CNN-1
(Majumder et al. 2017), CNN-2 (Yu and Markov 2017), Gated Recurrent Unit (GRU) network (Yu and
Markov 2017), AttRCNN (Xue et al. 2018), LSTM+CNN (Sun et al. 2018), and the graph convolutional
networks GCN (Wang et al. 2020). We also included IBM Personality Insights (Liu et al. 2016),
Personality2Vec (Guan et al. 2020), and the well-known BERT neural language model developed at Google
(Devlin et al. 2019), which has outperformed other methods for many NLP tasks. BERT-Base was simply
fine-tuned on our training data sets (no further pre-training). Conversely, PersonaBERT further pre-trained
the BERT-Base model from checkpoints using the same Sentiment140 and 1BWord corpora used by
DeepPerson, before fine-tuning on our training data sets. BERT+NN used the BERT-Base transformer
Consistent with previous studies (Farnadi et al. 2013; Alam et al. 2013; Majumder et al. 2017; Yu and
Markov 2017; Wang et al. 2019a), the personality label of a post/document was considered to be a binarized
(median split) representation of the survey-based gold-standard personality label of the user who
contributed the post/document – hence, personality detection was considered a binary classification
problem. The class label 𝑐𝑐 ∈ {0,1} was assumed for each of the Big Five dimensions, and in each run, a
personality detector classified whether a document contained that particular personality dimension.
Following the common evaluation process for machine learning models involving user-centric data
(Prechelt 1998; Ahmad et al. 2020a), our data set was divided into a training set (50% of users), a validation
set (25% of users), and a test set (25% of users). Training was performed on all documents associated with
users in the training set, parameter tuning occurred on the validation users’ documents, models were
evaluated on the test users’ documents. In order to make the evaluation more robust, a repeated random
sub-sampling validation process was invoked where the training-validation-testing user splits were
randomly shuffled ten times. For design evaluation, standard document classification metrics such as
precision, recall, F-score, accuracy, AUC were macro-averaged across the Big-five personality categories
(Alam et al. 2013). We also report performance on each of the five dimensions, separately. Moreover, we
25
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
adopted a non-parametric statistical test, namely the Wilcoxon signed-rank test (Wilcoxon 1992) to evaluate
the statistical significance of the different performance scores achieved by various models. DeepPerson was
implemented on the ELMo architecture in Pytorch. Consistent with prior studies, a grid search was used to
tune parameters on the validation set. A mini-batch size of 500 and dropout rate of 0.5 were used.
In this section we describe the overall design evaluation results for DeepPerson relative to the
aforementioned feature-based, deep learning, and language modeling methods. We present results for the
PANDORA and myPersonality data sets related to personality traces appearing in social media posts
(Tables 4 and 5). The results on the essay data can be found in Appendix B. The first two columns in Tables
4 and 5 depict the category of method and specific method name. The next five columns show F-scores for
individual Big-five dimensions, whereas the last six columns display macro-averaged f-score, precision,
Paradigm Method EXT NEU CON AGR OPN Av. F Av. P Av. R Acc AUC Imp.
Transfer
DeepPerson 64.9 64.3 63.8 66.5 66.1 65.1 67.8 62.7 69.9 75.0 +33.7%
Learning
CNN-1 58.6 58.7 57.8 59.9 60.5 59.1 60.6 57.7 65.2 64.0 +14.1%
CNN-2 57.7 57.9 57.1 59.6 58.4 58.1 59.5 56.8 64.4 62.7 +11.8%
Represent. AttRCNN 59.2 59.0 57.0 61.9 60.5 59.5 61.0 58.2 65.6 64.6 +15.2%
Learning Msg-Attn 56.2 56.8 55.6 58.3 57.3 56.9 57.9 55.9 63.4 60.5 +7.8%
GRU 56.3 54.1 53.7 58.3 55.1 55.5 56.2 54.9 61.8 57.8 +3.0%
LSTM+CNN 57.0 57.2 56.5 59.1 58.0 57.5 58.8 56.3 64.0 61.2 +9.1%
GCN 56.7 56.3 56.2 58.4 56.8 56.9 58.0 55.8 63.4 60.5 +7.8%
PersonaBERT 58.2 58.3 57.5 60.1 59.5 58.7 60.2 57.4 65.0 63.4 +13.0%
Language BERT-Base 55.2 56.7 56.7 58.8 57.9 57.1 58.3 55.9 63.7 61.5 +9.6%
Model BERT+NN 55.5 56.9 57.0 58.3 58.2 57.2 58.5 56.0 63.8 60.6 +8.0%
RoBERTa 58.4 58.0 57.9 59.7 60.1 58.8 60.3 57.5 65.0 63.2 +12.7%
IBM 55.2 53.0 52.9 49.7 47.6 53.4 52.8 54.1 57.5 56.1 -
KNN 56.3 55.6 53.9 57.6 57.3 56.1 57.1 55.2 63.0 58.6 +4.5%
Feature- SVM 56.2 55.7 54.9 56.6 51.9 55.1 56.0 54.2 61.7 56.9 +1.4%
based XGBoost 56.2 56.7 54.2 57.6 56.2 56.2 57.2 55.3 62.9 58.9 +5.0%
Personality2Vec 58.3 58.2 58.0 60.2 58.4 58.6 60.0 57.3 64.8 62.9 +12.1%
SMOTETomek 57.4 56.8 55.7 57.4 53.3 56.1 57.2 55.1 62.5 59.4 +5.9%
Notes. “Av. F”, “Av. P”, “Av. R”, “Acc”, “AUC” refer to macro-averaged F-score, precision, recall, accuracy, and area under
the ROC curve w.r.t five personality categories. “Imp.” refers to percentage improvement in terms of AUC. All numbers are shown
in % format. CNN-1 (Majumder et al. 2017), CNN-2 (Yu and Markov 2017), GRU (Yu and Markov 2017), BERT-Base (Devlin et
al. 2019), BERT+NN (Leonardi et al. 2020), RoBERTa (Liu et al. 2019), KNN (Farnadi et al. 2013), SVM (Wright and Chin 2014),
XGBoost (Tadesse et al. 2018), AttRCNN (Xue et al. 2018), Msg-Attn (Lynn et al. 2020), GCN (Wang et al. 2020), Personality2Vec
(Guan et al. 2020), SMOTETomek (Wang et al. 2019a), LSTM+CNN (Sun et al. 2018).
26
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
Paradigm Method EXT NEU CON AGR OPN Av. F Av. P Av. R Acc AUC Imp.
Transfer
DeepPerson 67.4 66.8 66.6 66.3 67.7 67.0 69.5 64.8 70.3 70.7 +25.1%
Learning
CNN-1 58.2 57.0 58.0 54.3 59.5 57.4 58.8 56.1 62.2 62.7 +11.0%
CNN-2 57.0 55.7 56.0 53.3 58.1 56.0 57.4 54.8 61.0 60.2 +6.5%
Represent. AttRCNN 58.7 58.5 57.7 60.3 59.8 59.0 60.5 57.6 64.2 63.3 +12.0%
Learning Msg-Attn 56.0 56.9 56.1 53.5 57.2 55.9 56.8 55.2 51.1 61.2 +8.3%
GRU 55.1 50.3 53.6 51.2 57.2 53.5 54.2 52.8 59.1 59.0 +4.4%
LSTM+CNN 56.0 55.9 55.9 53.3 60.3 56.3 56.6 56.0 54.3 62.5 +10.6%
GCN 57.1 56.1 54.7 54.9 60.0 56.5 56.4 56.6 58.8 61.2 +8.3%
PersonaBERT 59.5 53.2 57.0 55.1 60.6 57.1 58.3 56.0 62.1 61.2 +8.3%
Language BERT-Base 57.2 53.9 56.0 53.2 60.6 56.2 58.1 54.4 61.6 60.6 +7.3%
Model BERT+NN 57.6 53.9 56.1 53.1 60.5 56.2 59.0 54.1 61.0 60.1 +6.4%
RoBERTa 58.7 55.2 56.7 56.3 59.9 57.4 59.4 55.5 62.5 62.7 +11.0%
IBM 56.2 51.0 52.2 45.5 42.3 49.4 50.1 48.8 53.0 56.5 -
KNN 56.5 58.3 54.1 54.2 58.3 56.3 57.8 55.0 54.4 62.0 +9.7%
Feature- SVM 54.5 54.8 51.5 52.2 60.6 54.7 54.7 54.8 51.6 61.0 +8.0%
based XGBoost 57.3 57.0 54.3 55.1 56.2 56.0 57.2 55.0 55.3 60.9 +7.8%
Personality2Vec 57.8 57.0 59.0 55.0 58.4 57.5 58.0 56. 9 57.8 61.7 +9.2%
SMOTETomek 54.7 55.2 53.4 52.3 60.1 55.8 56.2 54.2 47.5 61.0 +8.0%
comparison methods in terms of AUC, macro F-score, precision, recall, and accuracy. These performance
deltas are consistent across individual personality dimensions. DeepPerson outperforms the best
comparison methods, namely AttRCNN (Xue et al. 2018), CNN-1 (Majumder et al. 2017) and
PersonaBERT, by 5 to 15 percentage points across all measures. Using IBM Personality Insights (i.e., the
weakest comparison method) as a reference point for percentage lift in AUC, DeepPerson is +25% to
+33% higher on the two data sets. This is nearly 13% to 20% relative percentage points higher than the
best comparison methods, respectively. The Wilcoxon signed-rank tests reveal that DeepPerson’s gains are
significant. For instance, compared to CNN-1 (𝑊𝑊 = 0, 𝑝𝑝 < .01) for EXT, NEU, CON, AGR, and OPN.
While not depicted here, the results on the Essay data are comparable - DeepPerson significantly
outperforms all comparison methods (see Appendix B of the Online Supplement). Finally, since our
ultimate goal for downstream tasks is to try to approximate a user’s personality dimensions (averaged over
all document-level scores), we also report results for user-level approximation on PANDORA and
myPersonality in Appendix B (Tables B3/B4). DeepPerson attains Pearson’s correlation values that are at
least 10-18 points higher than the best comparison method, and MSE values that are also at least 10% lower.
The results seem to support the efficacy of middle-ground frameworks that harness rich domain knowledge
27
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
and context-relevant NLP theory in conjunction with powerful state of the art machine learning approaches.
In the ensuing section, we use ablation analysis to show that the performance of DeepPerson is attributable
to its key components that support the SFLT-based design guidelines: CNN-LSTM, wlpHAN, and SPDFiT.
Two key components of DeepPerson are the wlpHAN attention network and the pseudo-labeling SPDFiT
transfer learning method. In order to evaluate their additive impact on DeepPerson, we ran experiments
where wlpHAN was removed and SPDFiT was replaced with other baseline methods. The results on the
PANDORA data are presented in Table 6 – the myPersonality results can be found in Appendix B (Table
B1). DeepPerson devoid of wlpHAN appears as the first setting: CNN-LSTM (SPDFiT). The absence of
wlpHAN does reduce AUC by about 5 percentage points (relative to the first row in Table 5), underscoring
the importance of wlpHAN. The second and third settings depict DeepPerson with wlpHAN and SPDFiT
removed. In these settings, the CNN-LSTMs were pre-trained using the 1B Word benchmark collection
(Chelba et al. 2013) before fine-tuning with the PANDORA training data, and in the case of row two (i.e.,
1BWord+Sentiment140), further pre-trained with the Sentiment140 corpus (Go et al. 2009). More details
of the experiments are reported in Appendix I of the online supplement. We also report the basic descriptive
In settings 4-5, SPDFiT was replaced with other state-of-the-art transfer learning methods: UDA (Xie
et al. 2020) and Self-Ensembling (Laine and Aila 2016). We implemented UDA and Self-Ensembling using
an open-source back-translation tool for data augmentation (Edunov et al. 2018). UDA used a loss function
based on KL divergence while Self-Ensembling employed mean square error as the loss function. Since
UDA and Self-Ensembling are not specifically designed for personality detection tasks, to have a fair
comparison, they employed the same exact psychological lexicons as SPDFiT (i.e., the LIWC, MRC, and
SPLICE). Settings 6-8 depict alternative pseudo-labeling methods that utilize logistic regression (Lee’s
2013), Lasso regression (Hastie et al. 2009), or Ridge regression for pseudo-labeling. Unlike SPDFiT, these
pseudo-labeling methods are not equipped with a quality assessment metric to filter out low-quality labels.
We also included three BERT (Devlin et al. 2019) settings, the aforementioned BERT-Base and
28
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
PersonaBERT, plus an intermediate setting only further pre-trained on 1BWord (but not Sentiment140)
before being fine-tuned on PANDORA training data (settings 9-11). In setting 12, we replaced CNN-LSTM
with just a Bi-LSTM. Finally, setting 13 used a Doc2vec (Le and Mikolov 2014) – like BERT-Base, this
setting too signified the impact of no domain-specific pre-training. The BERT, Doc2Vec, and Bi-LSTM
Method EXT NEU CON AGR OPN Av. F Av. P Av. R Acc AUC Imp.
1. CNN-LSTM (SPDFiT) 62.1 61.8 61.4 63.7 62.7 62.3 64.5 60.4 67.7 70.0 20.1%
2. CNN-LSTM
58.1 58.7 57.6 61.0 59.5 59.0 60.4 57.6 65.1 63.6 9.1%
(1BWord+Sentiment140)
3. CNN-LSTM (1BWord) 57.7 57.9 57.0 59.7 58.9 58.2 59.6 57.0 64.6 62.3 6.9%
4. CNN-LSTM (UDA) 59.2 59.2 58.5 61.2 61.0 59.8 61.5 58.3 65.8 65.1 11.7%
5. CNN-LSTM (Self-Ensembling) 59.4 59.1 58.2 61.0 60.8 59.7 61.3 58.2 65.8 65.2 11.8%
6. CNN-LSTM (Logistic) 58.9 59.1 58.1 61.6 60.4 59.6 61.2 58.1 65.6 64.9 11.3%
7. CNN-LSTM (LASSO) 58.8 59.0 58.0 61.5 60.1 59.5 61.0 58.0 65.5 64.7 11.0%
8. CNN-LSTM (Ridge) 58.8 59.1 58.1 61.4 60.3 59.5 61.1 58.1 65.6 64.7 11.0%
9. PersonaBERT 58.2 58.3 57.5 60.1 59.5 58.7 60.2 57.4 65.0 63.4 8.7%
10. BERT (1BWord) 56.7 56.7 55.8 61.0 58.9 57.8 59.0 56.8 64.2 61.3 5.1%
11. BERT (Base) 55.2 56.7 56.7 58.8 57.9 57.1 58.3 55.9 63.7 60.2 3.3%
12. Bi-LSTM (1BWord) 56.0 55.9 56.6 57.3 57.1 56.6 57.7 55.5 63.2 59.1 1.4%
13. Doc2Vec (Pretrained) 54.6 55.0 55.9 56.6 57.2 55.8 56.8 54.9 62.5 58.3 -
Notes. “Av. F”, “Av. P”, “Av. R”, “Acc”, and “AUC” refer to macro-averaged F-score, precision, recall, and accuracy w.r.t five
personality categories. “Imp.” refers to improvement in terms of AUC. All numbers are shown in % format.
The improvement column in Table 6 shows that DeepPerson devoid of wlpHAN improves AUC by
20% (F-score by +11.7%) compared with the Doc2Vec (pretrained) approach, and is at least +8% better
than all ablation settings in terms of relative percentage improvement. The exclusion of SPDFiT after
wlpHAN has already been removed (settings 2-8) degrades performance by 5-7 points in terms of AUC
(relative improvement of at least +8%). This includes alternative pseudo-labeling methods such as CNN-
methods like UDA and Self-Ensembling. Although not depicted, with SPDFiT and wlpHAN, this relative
delta is about +28%. SPDFiT (setting 1) also outperforms all BERT models (settings 9-11), including when
further pre-trained on the same domain-specific corpora (and fine-tuned on personality training data), by at
least 11% in terms of relative percentage improvement. Finally, CNN-LSTM (setting 2) outperforms the
use of Bi-LSTM (setting 12), suggesting that even without wlpHAN and SPDFiT, the CNN-LSTM setting
still works well. Collectively, the results of this first ablation analysis underscore the importance of all three
29
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
key components of DeepPerson, and SPDFiT in particular. Wilcoxon signed-rank tests show that these
An important consideration for transfer learning approaches is the amount of unlabeled data needed to
garner enhanced predictive power. We performed additional analysis to examine the impact of the
proportion of pseudo-labeled data on the performance of SPDFiT. We varied the percentage of unlabeled
training examples from 10% to 100% (i.e., 100% denotes the full unlabeled data set), in increments of 10%.
In order to isolate the impact of just using unlabeled data, for all methods evaluated, no fine-tuning was
performed on labeled training data. Hence, unlabeled data was used to train the models, which were then
evaluated on the PANDORA and myPersonality test data across the various folds. For each increment,
DeepPerson and comparison methods were trained for 20 epochs. The top two charts in Figure 4 depict
plots of the classification performance when using SPDFiT versus comparison transfer-learning
alternatives. The results reveal that SPDFiT is able to garner fairly good results when using as little as 50%
of the full unlabeled training set – moreover, it outperforms all comparison methods in terms of overall F-
score when using 40% or more of the unlabeled data on PANDORA or 30% or more of the data on
30
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
myPersonality. The bottom two charts depict the performance of SPDFiT on the five individual personality
dimensions. Though not shown here, SPDFiT outperformed all comparison methods on all five dimensions
when using just 50% of the unlabeled data. Given the wide range over which SPDFiT works well, we
believe the results further underscore the robustness of the SPDFiT component of DeepPerson.
For the second ablation analysis, we examined the effectiveness of the word-, layer-, and person-based
components of wlpHAN (depicted in Figure 3). For all settings, DeepPerson was invoked without the
SPDFiT module to better isolate the performance impact of wlpHAN. In particular, we compared the
detection performance of CNN-LSTM with full wlpHAN (setting 1) against a word-based attention only
(i.e., no layer or person-level attention, setting 4), one with synsem+word (no concept embedding in the
layer level attention – setting 3), and one with synsem+concept+word but no person-level encoder (setting
2). As noted in our related work section, incorporating psychological concepts into our deep learning model
might be construed as being somewhat analogous to aspects-level sentiment classification (Cheng et al.
2017; Wang et al. 2019b, Li et al. 2019, Galassi et al 2020). Accordingly, in settings 5-7, in place of
wlpHAN we substituted three aspect attention methods based on the notion of aspect-aware functions (Zhou
et al. 2019): Dot-Product Attention (DPA), Concat Attention (CA) and General Attention (GA). In settings
8-12, we swapped out wlpHAN for other state-of-the-art attention networks such as HAN (Yang et al.
2016), SATT-LSTM (Jing 2019), HCAN (Gao et al. 2018), AttRCNN (Xue et al. 2018), and Message-level
As shown in Table 7 (settings 2-4), the syntax/semantic layer, concept, and person level encoders each
contribute about 2 percentage points to wlpHAN’s overall AUC. wlpHAN also outperforms other state-of-
the-art attention networks depicted in settings 8-12 such as HAN, SATT-LSTM, AttRCNN, Msg-Attn, and
HCAN by 3 to 6 percentage points. Further, when replacing wlpHAN with aspect-level attention networks
(i.e., settings 5-7 in Table 7), performance degrades by 5 to 6 percentage points. The relative percentage
improvements for wlpHAN compared to all existing attention models is 5% to 11%, with all differences
significant (p-values < 0.01). This performance improvement can be attributed to wlpHAN’s capability to
31
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
incorporate syntax, psychologic concepts, and person-level contextual information into the personality
detection process – these are all elements shown to be important for personality detection and are well-
aligned with our SFLT-based design guidelines (Gill and Oberlander 2003; Mairesse et al. 2007).
Method EXT NEU CON AGR OPN Av. F Av. P Av. R Acc AUC Imp.
1. CNN-LSTM (wlpHAN) 62.4 61.5 61.6 64.5 64.0 62.8 64.9 60.9 68.2 70.3 +20.0%
2. CNN-LSTM
61.1 60.6 60.0 62.7 62.6 61.4 63.3 59.7 67.1 68.0 +16.0%
(SynSem+Concept+Word)
3. CNN-LSTM (SynSem+Word) 60.3 59.7 59.4 61.5 61.6 60.5 62.1 59.0 66.3 66.2 +13.0%
4. CNN-LSTM (Word) 58.3 58.6 57.8 60.6 60.0 59.0 60.4 57.8 65.1 64.0 +9.2%
5. Aspect-Attention (DPA) 57.7 57.1 62.3 58.4 62.6 59.6 61.0 58.3 65.5 64.7 +10.4%
6. Aspect-Attention (CA) 56.8 57.5 60.3 58.7 61.6 59.0 60.3 57.7 65.1 63.9 +9.0%
7. Aspect-Attention (GA) 57.2 58.6 61.7 58.7 61.5 59.5 61.1 58.1 65.4 64.8 +10.6%
8. HAN 56.2 57.5 55.7 58.3 57.3 57.0 58.1 56.0 63.5 60.7 +3.6%
9. SATT-LSTM 55.2 55.9 54.8 57.5 56.3 55.9 56.8 55.1 62.7 58.6 -
10. HCAN 56.2 57.1 56.1 58.8 57.9 57.2 58.4 56.2 63.7 61.1 +4.3%
11. AttRCNN 59.2 59.0 57.0 61.9 60.5 59.5 61.0 58.2 65.6 64.6 +10.2%
12. Msg-Attn 56.2 56.8 55.6 58.3 57.3 56.9 57.9 55.9 63.4 60.5 +3.2%
Notes. “Av. F”, “Av. P”, “Av. R”, “Acc”, “AUC” refer to macro-averaged F-score, precision, recall, accuracy, AUC w.r.t five
personality categories. “Imp.” refers to improvement in terms of AUC. All numbers are shown in % format.
As noted in Figure 2 and related discussion, and shown empirically with ablation results presented in Tables
6 and 7, the psychological concepts and patterns derived using CNN-LSTM coupled with wlpHAN (with
performance boosted by SPDFiT) are critical to the performance of DeepPerson relative to the state-of-the-
art. To delve deeper into these results, we conducted a series of pair-wise comparisons of instance-level
error rates for DeepPerson versus CNN-1, CNN-2, PersonaBERT, and AttRCNN. In each comparison, we
identified the 25% of instances on PANDORA with the widest prediction error margins between
DeepPerson and each comparison method (i.e., the cases where DeepPerson was most accurate relative to
the comparison method in terms of MSE or MAE). For these instances, we then used the following additive
ablation settings to identify how various components of DeepPerson contributed to these deltas: CNN-
(+wlpHAN), and CNN-LSTM (+SPFFiT) which is the full DeepPerson. Further, this analysis was
performed within each of the Big Five traits (i.e., for all five DVs) to allow better understanding of how
learned patterns/components improve identification of different personality traits. The results for MSE
appear in Figure 5. Note that the y-axis shows relative improvements compared to the previous component.
32
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
Looking at the bar charts, we can see that just using CNN-LSTMs with the word representation
underperforms AttRCNN on all five dimensions (even on these instances where overall lifts are highest for
DeepPerson). Similarly, lifts versus CNN-1, CNN-2, and PersonaBERT are also modest on these instances
where DeepPerson as a whole is most dominant. Interestingly, adding synsem and concept patterns, the
personal embeddings in wlpHAN, and SPDFiT all cause large incremental improvements. It is worth noting
33
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
that the synsem and concept embeddings complement each other. While both have sizable lifts for all five
traits, the former is most effective on the conscientiousness and extraversion traits (green and red bars) and
the latter on agreeableness and openness (blue and orange). The results also show that the personal
embedding lift is most pronounced compared to PersonaBERT, and we see the SPDFiT moderating “boost”
across all five traits, in all four comparisons. By comparing results on instances most likely driving relative
deltas for DeepPerson against four of the best benchmarks, on all five traits, the results underscore how
DeepPerson uses representational richness via its three main components to better infer personality digital
The error analysis in Figure 5 shows the importance of synsem and concept embeddings for improving
detection of all five personality traits. In order to illustrate the types of syntactic/semantic (synsem) and
concept patterns learned by DeepPerson, previously highlighted in Figure 2, we performed two additional
analyses. In the first, we identified user-trait tuples for which DeepPerson yielded accurate personality
dimension scores (averaged across all their documents) and AttRCNN had high error rates. We then
extracted key concept patterns for these users by identifying wlpHAN tokens with high attention scores in
34
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
the multi-layer concept embeddings. The results for three example users with high respective EXT, NEU,
and CON appear in Table 8. The concept pattern tags correspond to categories in LIWC. Interestingly,
many of the key concept patterns learned are consistent with those observed manually in prior text-based
personality analysis. For instance, extroverts (EXT) tend to make positive references to friends and social
processes, individuals with neuroticism (NEU) often describe their feelings and exhibit a wider range of
emotions including anger and anxiety, and those that are conscientious (CON) make references to
Table 9 shows some of the most prevalent synsem patterns for these same traits. For the synsem
patterns, we added part-of-speech tag annotations ex post (using the Penn Treebank), to better illustrate the
syntactic elements of the synsem patterns. These patterns complement the concept embedding based ones.
For instance, extroverts make greater use of compound conjunctions (CC) and punctuation that allow
conveyance of additional information, neuroticism manifests in the form of greater usage of first-person
pronouns (PP), and conscientious writers make greater use of adjectives (JJ) for detail. These results
illustrate the types of personality cues learned by DeepPerson (and highlighted by wlpHAN), which relate
to ideational, textual, and interpersonal meta-functions alluded to in SFLT. Overall, the ablation and error
analysis results lend credence to the utility of our CNN-LSTM, wlpHAN and SPDFiT components, and
further highlight the overall efficacy of our DeepPerson framework. In the ensuing section, we show that
these performance deltas can also translate into downstream value in two forecasting case studies.
35
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
The enhanced NLP-based personality detection afforded by DeepPerson is only valuable if the generated
personality dimension variables can lead to improved descriptive insights or better predictive foresight. We
test the latter – the ability of DeepPerson generated Big-Five personality variables to improve forecasting
in financial and health contexts with implications for business analytics and policy, respectively. In this
section, we use DeepPerson to compute Big-Five personality scores for senior executives at S&P 1500
firms based on their Twitter posts. We then use these personality variables, along with other features, to
forecast future firm financial performance metrics. In a second case appearing in Appendix H, we score the
personalities of world and state-level leaders (executives) based on their tweets, and use this information to
In the remainder of this section, we demonstrate that senior executives’ personality traits derived using
DeepPerson can significantly improve our ability to predict firms’ policy and financial outcomes - relative
to existing personality methods and exclusion of personality information entirely. Such forecasts are of
interest to many stakeholder groups, including investors (FinTech) and corporate headhunters (workforce
analytics). We focus on the personality traits of senior executives who are employed by the constituent
firms of the S&P Composite 1500 Index, which encompasses large corporations, mid-size firms, and small
firms. Consistent with prior IS studies (Shi et al. 2016), we retrieved information about senior executives
at S&P-1500 firms from the company pages of CrunchBase. Using definitions (and job titles) for senior
executives as explicated in prior studies (Masli et al. 2016; Medcof 2007), we managed to gather
information related to senior executives at 425 of the S&P-1500 firms. This included names, Twitter
accounts, education levels, etc. for employees who had c-suite job titles. These senior executives’
demographic and compensation information were also retrieved from the Executive Compensation
database. Among the identified senior executives, we selected those who were employed between 1990 and
2017, and who possessed Twitter accounts, resulting in 352 executives: 219 CEOs, 40 CFOs, 22 CXOs,
188 directors, 62 presidents, and 10 chairmen. All tweets composed by the identified executives between
36
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
2006 and 2017 were retrieved. Retweeted content, URLs, and images were excluded. This resulted in an
Following the experimental procedure described in Section 4, DeepPerson was fine-tuned using the
training set of the myPersonality (Facebook) corpus before it was invoked to derive the Big-Five personality
dimension scores based on executives’ Twitter posts. Prior leader personality studies note the benefits of
using models trained on larger sets of general social media data, such as the ability to use personality labels
from hundreds or thousands of users for training (Hrazdil et al. 2020). Further, prior work does not note
differences in personality trait linguistic patterns and cues based on one’s personal status or professional
standing (Mairesse et al. 2007). Consistent with prior work, we assume that personalities are relatively
stable during the aforementioned analysis period (Cobb-Clark and Schurer 2012). Following the
methodology adopted by (Bertrand and Schoar 2003), we collected annual financial indicators related to
firms’ policy and financial outcomes for 1990-2017 using the Compustat database. These indicators were
37
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
investment (INVEST), cash flow (CF), cash holdings (CH), leverage (LEVER), interest coverage (IC), the
ratio of selling, general and administrative expenses (SG&A), the ratio of dividends and earnings over
incomes (D&E), and return on asset (ROA) (Bertrand and Schoar 2003). The basic descriptive statistics of
the dependent variables and predictor variables/features used in our case study are shown in Table 10.
According to Henderson et al. (2006), senior executives usually learn and exert influence rapidly during
their initial employment period. Accordingly, we focus on examining if personalities of senior executives
may predict firms’ policy and financial outcomes during their initial tenure (i.e., short-term impact). To
measure firms’ outcomes, consistent with prior studies (Dubofsky and Varadarajan 1987; Li and Simerly
1998), we calculate the first two-year average of each chosen financial indicator after a senior executive
has joined a firm. More specifically, the average of the logarithm of the annual measures was used to reduce
skewness (Chu et al. 2013). Only those firm-year observations were retained where a single senior executive
joined the firm in each two-year observation period. This resulted in 519 total firm-executive-biennial
observations in our data set. Following Bonsall et al. (2017), we eliminated instances for a given firm or
financial DC if any of the DVs or IVs of interest were missing in that first two-year period. The DV counts
Given our stated objective of demonstrating the utility of personality dimensions generated using
DeepPerson for predicting firms’ policy and financial outcomes, it was important to incorporate a robust
set of accompanying predictor variables (i.e., features) and forecasting models such that performance lifts
due to DeepPerson were atop reasonable baseline models. Consistent with prior work forecasting financial
measures, we used two well-known predictive regression methods well-suited for inferring non-linear
patterns: random forest regression (RFR) and gradient boosted decision trees (GBDT), both available in the
Scikit-learn package (Pedregosa et al. 2011). We formalize our prediction tasks as follows:
𝑰𝑰𝑪𝑪𝑪𝑪𝑪𝑪 𝒊𝒊𝒊𝒊 = 𝒇𝒇(𝑬𝑬𝑬𝑬𝑬𝑬, 𝑵𝑵𝑵𝑵𝑵𝑵, 𝑨𝑨𝑨𝑨𝑨𝑨, 𝑪𝑪𝑪𝑪𝑪𝑪, 𝑶𝑶𝑶𝑶𝑶𝑶, 𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩 𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭) (11)
where 𝐼𝐼𝐶𝐶𝐶𝐶𝐶𝐶 𝑖𝑖𝑖𝑖 is the logarithm of the first two-year average for each chosen financial indicator after a
senior executive has joined a firm, and 𝑓𝑓(∙) is a non-linear function capturing the relationship between the
predictor variables (i.e., personality traits and baseline features) and dependent variables (i.e., financial
38
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
indicators). For our baseline feature set, in addition to lagged (t-1) performance and (t-1) policy indicator
values as features, we also incorporated relevant lagged financial measures used in prior studies (Bonsall
et al. 2017). These included logarithms of total assets, return on assets (ROA), and cash flow (Bertrand and
Schoar 2003; Barth et al. 2001). In order to capture industry-specific variations, firm Standard Industrial
Classification (SIC) codes were included as a feature. Executives’ personal characteristics used in prior
studies were also incorporated as features, including: age, gender, income, education level, and reputation
(Bertrand and Schoar 2003; Brick et al. 2006; Weng and Chen 2017). Adapting the methodology proposed
by (Weng and Chen 2017), reputation was estimated by counting the frequency of appearance of the
executive’s name in news articles retrieved from Google. In order to account for baseline semantic
information embedded in executives' tweet text, we also included the sentiment of the tweets given by
LIWC as well as their top-10 topics extracted using Latent Dirichlet Allocation (i.e., from the document-
topic vector) (Blei et al. 2003). We report the statistic of top-5 topics on Table 10. Finally, basic social
media-based features such as the number of tweets, followers, and favorites were also included.
We ran the aforementioned regression models either with or without the DeepPerson personality
dimensions as features. The models devoid of personality features included all other variables discussed
(i.e., financial, personal, and social media sentiment/topic). We also compared performance using
personality dimensions generated with DeepPerson relative to methods benchmarked earlier in our design
evaluation: CNN-1, CNN-2, and PersonaBERT. In all experiments, the widely-used mean square error
(MSE) and mean absolute error (MAE) metrics were employed to measure predictive power. The
improvement in performance brought about by inclusion of personality features was once again computed
𝑀𝑀𝑀𝑀𝐸𝐸𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 −𝑀𝑀𝑀𝑀𝐸𝐸𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
as follows: 𝐼𝐼𝐼𝐼𝐼𝐼 = × 100%. Consistent with our design evaluation, all models
𝑀𝑀𝑀𝑀𝐸𝐸𝑏𝑏𝑏𝑏𝑏𝑏𝑒𝑒𝑙𝑙 𝑖𝑖𝑖𝑖𝑖𝑖
were trained on a training split and tested on subsequent instances. Once again, the non-parametric
Tables 11 and 12 show the percentage improvements in MSE and MAE, respectively, and statistical
significances when adding DeepPerson-based personality features to the baseline feature set devoid of
39
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
personality information, as well as the results when using CNN-1, CNN-2, and PersonaBERT Big Five
personality features. The tables report results for GBDT and RFR each run with 20 estimators. In general,
the inclusion of the DeepPerson-based personality dimension features improves MSE or MAE by 4% to
15% for each of the 8 possible dependent variables (6 policy indicators and 2 performance indicators). The
average improvements using DeepPerson are in the 6.1% to 14.3% range across the two models and
MSE/MAE metrics. Performance gains for all 8 dependent variables attributable to inclusion of the five
DeepPerson-based personality dimensions were significant (p-values < 0.05). These results suggest that the
personality measures derived using DeepPerson can enhance predictive power in firm policy and
performance forecasting contexts. Next, when comparing the results for DeepPerson-based personality
dimensions versus those derived using comparison detection methods such as CNN-1, CNN-2, and BERT,
there are three important takeaways worth highlighting. First, the RFR and GBDT models using personality
features derived via DeepPerson improve MSE and MAE by an average of 4% to 14% over the comparison
methods. Second, among the three benchmark comparison methods, features generated using BERT and
CNN-1 improve average results across the eight firm policy and performance prediction tasks (with average
lifts of 2% to 8%). However, on average, the use of CNN-2 garners little to no improvement. Although
40
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
Third, we also comparatively evaluated the classical ARIMA model widely used in predicting financial
time series data (Mohamed et al. 2010). Similar to GBDT and RFR, ARIMA parameters were tuned
extensively, including the order of the auto-regressive function, the differentiation term, and the order of
the moving average. The last row of Tables 11 and 12 shows the MSE and MAE score percentages for
ARIMA relative to the GBDT-DeepPerson model. ARIMA had significantly lower results across all 8 firm
policy and performance indicators, with almost 17% worse MSE as a whole (all p-values < 0.05). While
these results were using cross-validation, we also performed a single chronological training-testing split as
a robustness check. Those results, in Appendix G, are consistent with results appearing here. Collectively,
these results further underscore the value of the personality dimensions derived using DeepPerson.
As a robustness check, we repeated the empirical case study using only executives and data from the
S&P-500 and garnered similar results. We also examined the impact of specific Big-Five dimensions as
features to see which traits are the strongest predictors. We also conducted a sensitivity analysis to evaluate
the minimal number of executives’ tweets required to produce significant prediction improvement. These
results appear in Appendices C, D, and E, respectively. In Appendix F, we show that these downstream
results also hold for DeepPerson ablation settings examined in Sections 4.3 and 4.4. As noted earlier, a
second downstream predictive application of DeepPerson in the context of COVID-19 forecasting appears
in Appendix H. Collectively, our results show that downstream forecasting models utilizing personality
dimensions scored by DeepPerson can dramatically enhance their results, whereas this is not the case when
using benchmark personality detectors or classic time series forecasting methods. As shown in the user-
41
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
level results in Appendix B, personality scores generated with DeepPerson are better correlated with survey-
based personality measurements relative to comparison methods. The imprecision of comparison text-based
personality detection methods may lead to incorrect personality traits (i.e., noisy features). It is generally
believed that noisy features tend to jeopardize the performance of a prediction model (John et al. 1994). In
other words, the design evaluation deltas reported in the prior section do translate into operational utility in
From a design science perspective, we make three contributions. First, we propose a novel DeepPerson
framework that makes personality detection from text possible, practical, and valuable. Second, as part of
our framework, we propose two novel machine learning artifacts, namely the self-taught personality
detection fine-tuning (SPDFiT) transfer learning approach, and the word-layer-person attention network.
Third, through a robust design evaluation and two case studies, we offer empirical insights on the extent of
operational utility afforded by DeepPerson and its key components, including for downstream forecasting
tasks in financial and health contexts. Our results also have at least four important implications for IS
1) Debunking the “Brute Force AI” Fallacy – In recent years, with the rise of Big Data and cloud
computing, it has been suggested that large-scale deep learning models encompassing billions of parameters
tuned using millions of documents can address most NLP problems. The idea that such generic language
models are “all you need” has been perpetuated by industry research related to powerful artifacts such as
BERT and GPT-3 (Devlin et al. 2019; Brown et al. 2020). However, due to the pace of change and lack of
thorough benchmarking, the efficacy and utility of such artifacts for a breadth of NLP tasks might be
overstated (Zimbra et al. 2018). Our findings suggest that not only are such language models markedly less
effective for personality detection than DeepPerson, they are often unable to offer statistical or practical
significance for downstream forecasting contexts. This is consistent with recent studies that have warned
generic language models are like “stochastic parrots” that might be getting too big by over relying on the
sheer number of word tokens used during pre-training (Bender et al. 2021). Case in point, BERT-Base and
42
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
PersonaBERT relied on 3.3 and 4.1 billion tokens, respectively, whereas DeepPerson only used 800 million.
RoBERTa used ten times as much data as BERT (an estimate 30 billion-plus tokens). As we foreshadowed
earlier, we believe the demise of artifacts grounded in principled domain adaption has been overstated.
2) Design Science as a Mechanism for Middle-ground Frameworks – In contexts where limited labeled
data related to the target task is available, brute force learning strategies are less effective. In such cases,
representation engineering that adapts machine learning artifacts such as encoders, embeddings, attention
mechanisms, and custom transfer learning schemes can present opportunities for effective domain
adaptation (Abbasi et al. 2019). By serving as a mechanism for balancing the tradeoffs between data and
intuition, socio and technical factors, inductive versus deductive insights, and general versus domain-
specific learning, design science represents a robust approach for developing middle-ground frameworks
that harness the power of human cumulative tradition in concert with powerful artificial intelligence.
3) The Importance of Personality for Predicting Policy – We show that when done correctly,
personality dimensions can improve our foresight related to prediction of policy indicators and outcomes.
The inclusion of personality measures derived by DeepPerson enhanced forecasts for financial policy
indicators by 6 to 14 percentage points on average. Similarly, DeepPerson attained the biggest lifts for
health pandemic forecasting relative to alternative epidemiological and data-driven models examined (see
Appendix H). Recently, many predictive analytics researchers have noted the challenges related to
forecasting complex policy-related outcomes, including noisy input data and the need for a diversity of
models (Hutson, 2020; Bertozzi et al. 2020). Our results suggest that the traits of leaders tasked with
informing policy-related decisions might be another important input for such models. In addition to
influencing decisions directly, leaders’ traits may often reflect the characteristics of the organizations or
populations they lead and represent – for example, advisory boards and employees in firms, or the general
public and government in states and countries (Hambrick 2007). Whereas the reverse causal relationship
between leader personality and outcomes of organizations might be debated in empirical causal inference
studies, in prediction contexts (Shmueli and Koppius 2011), our study suggests that the personality of
43
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
executives might serve as a rich low-dimensional feature representation for forecasting policy-related
implications for the broader movement towards “proactive personalization.” In personalized marketing,
personality information can enrich predictive models related to various stages of the customer lifecycle
including acquisition, retention, and expansion (Gupta et al. 2006; Brown et al. 2015). As cybersecurity
moves from reactive to proactive, personality measures could enhance predictive user models in human-in-
the-loop frameworks (Parrish et al. 2009; Bravo-Lillo et al. 2010). In human capital management contexts,
workforce analytics models already leveraging survey-based personality measures could be made timelier
with NLP-based personality scores (Ryan and Herleman 2015). In precision medicine, with the trend
towards public health 3.0 (DeSalvo et al. 2017), personality information can help better align preventative
interventions with individual patient characteristics (Friedman 2000). For instance, the conscientiousness
trait has been found to be predictive of health and longevity, from childhood to old age (Friedman et al.
2014). Higher extraversion is linked to greater likelihood of seeking preventative screenings (Aschwanden
et al. 2019). Lower conscientiousness and high neuroticism have been associated with greater vaccine
hesitancy (Murphy et al. 2021; Aschwanden et al. 2021). Personality could provide a mechanism for
measuring heterogeneity in user intent (Ahmad et al. 2022). NLP-based personality detection could inform
Our work is not without its limitations. Bias is an important consideration for NLP models (Lalor et al.
2022). Furthermore, future work on personality across languages, and using multimedia input including
audio and video, would be beneficial. Our design evaluation focused on social media postings, forum
messages, and lengthier texts (essays). Other relevant documents might warrant exploration, including
speech transcripts and written articles. Nevertheless, we believe this work has important implications for
research at the intersection of design and data science that integrates social-technical concepts into novel
domain-adapted machine learning artifacts, and for practitioners that enable, produce, or consume
predictive analytics where the inclusion of personality information may enhance insight and foresight.
44
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
7 References
Abbasi, A., and Chen, H. 2008. CyberGate: A Design Framework and System for Text Analysis of Computer-mediated
Communication. MIS Quarterly, 32(4): 811-837.
Abbasi, A., Zhou, Y., Deng, S., and Zhang, P. 2018. Text Analytics to Support Sense-Making in Social Media: A Language-Action
Perspective. MIS Quarterly 42(2): 427–64.
Abbasi, A., Sarker, S., and Chiang, R. H. 2016. Big Data Research in Information Systems: Toward an Inclusive Research
Agenda. Journal of the Association for Information Systems, 17(2), 3.
Abbasi, A., Kitchens, B., and Ahmad, F. 2019. The Risks of AutoML and How to Avoid Them, Harvard Business Review, digital
article: https://hbr.org/2019/10/the-risks-of-automl-and-how-to-avoid-them
Adamopoulos, P., Ghose, A., and Todri, V. 2018. The Impact of User Personality Traits on Word of Mouth: Text-Mining Social
Media Platforms, Information Systems Research 29 (3): 612–40.
Agastya, I. M. A., Handayani, D. O. D., and Mantoro, T. 2019. A Systematic Literature Review of Deep Learning Algorithms for
Personality Trait Recognition. In 5th Intl. Conf. on Computing Engineering and Design, 1-6.
Ahmad, F., Abbasi, A., Li, J., Dobolyi, D. G., Netemeyer, R., Clifford, G., and Chen, H. 2020. A Deep Learning Architecture for
Psychometric Natural Language Processing, ACM Trans. on Information Systems 38(1), no. 6.
Ahmad, F., Abbasi, A., Kitchens, B., Adjeroh, D. A., and Zeng, D. 2022. Deep Learning for Adverse Event Detection from Web
Search. IEEE Transactions on Knowledge and Data Engineering, forthcoming.
Ahmad H., Asghar M. Z., Khan A. S., and Habib A. 2020b. A Systematic literature review of personality trait classification from
textual content, Open Computer Science 10:175–193.
Alam, F., Stepanov, E. A., and Riccardi, G. 2013. Personality Traits Recognition on Social Network-Facebook. In Seventh
International Aaai Conference on Weblogs and Social Media, 1–4.
Arazy, O., Kumar, N., and Shapira, B. 2010. A Theory-driven Design Framework for Social Recommender Systems, Journal of
the Association for Information Systems, 11(9), 2.
Aschwanden, D., Gerend, M. A., Luchetti, M., Stephan, Y., Sutin, A. R., and Terracciano, A. 2019. Personality traits and preventive
cancer screenings in the Health Retirement Study. Preventive medicine, 126, 105763.
Aschwanden, D., Strickhouser, J. E., Sesker, A. A., Lee, J. H., Luchetti, M., ... and Terracciano, A. 2021. Psychological and
behavioural responses to coronavirus disease 2019: The role of personality. European Journal of Personality, 35(1), 51-66.
Back, M. D., Stopfer, J. M., Vazire, S., Gaddis, S., Schmukle, S. C., Egloff, B., and Gosling, S. D.. 2010. Facebook Profiles Reflect
Actual Personality, Not Self-Idealization. Psychological Science 21 (3): 372–74.
Barth, M. E., Cram, D. P., and Nelson, K. K.. 2001. Accruals and the Prediction of Future Cash Flows. The Accounting Review 76
(1): 27–58.
Beltagy, I., Lo, K., and Cohan, A. 2019. SciBERT: A Pretrained Language Model for Scientific Text, In Proceedings of the
Conference on Empirical Methods in Natural Language Processing, pp. 3615-3620.
Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. 2021. On the Dangers of Stochastic Parrots: Can Language
Models Be Too Big? In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 610-623.
Bertozzi, A. L., Franco, E., Mohler, G., Short, M. B., and Sledge, D. 2020. The challenges of modeling and forecasting the spread
of COVID-19. Proceedings of the National Academy of Sciences, 117(29), 16732-16738.
Bertrand, M., and Schoar, A.. 2003. Managing with Style: The Effect of Managers on Firm Policies. The Quarterly Journal of
Economics 118 (4): 1169–1208.
Blei D. M., Ng A. Y., and Jordan, M. I. 2003. Latent Dirichlet Allocation, Journal of Machine Learning Research 3: 993-1022.
Bonsall, S. B., Holzman, E. R., and Miller, B. P. 2017. Managerial Ability and Credit Risk Assessment. Management Science,
63(5): 1425-1449.
Bravo-Lillo, C., Cranor, L. F., Downs, J., and Komanduri, S. 2010. Bridging the Gap in Computer Security Warnings: A Mental
Model Approach, IEEE Security & Privacy, 9(2), 18-26.
Brown, D. E., Abbasi, A., and Lau, R. Y. 2015. Predictive analytics: Predictive modeling at the micro level. IEEE Intelligent
Systems, 30(3), 6-8.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... and Agarwal, S. 2020. Language Models are Few-
shot Learners. arXiv preprint arXiv:2005.14165.
Brick, I. E., Palmon, O., and Wald, J. K.. 2006. CEO Compensation, Director Compensation, and Firm Performance: Evidence of
Cronyism? Journal of Corporate Finance 12 (3): 403–23.
Celli, F., Pianesi, F., Stillwell, D., Kosinski, M, and others. 2013. Workshop on Computational Personality Recognition (Shared
Task). In Proc. 7th International AAAI Conference on Weblogs and Social Media, 2–5.
Chelba, C., Mikolov, T., Schuster, M., Ge, Q., Brants, T., and Koehn, P.. 2013. One Billion Word Benchmark for Measuring
Progress in Statistical Language Modeling. CoRR abs/1312.3005. http://arxiv.org/abs/1312.3005.
Chen, D., Wang, W. 0028, Gao, W. 0008, and Zhou, Z. 2018. Tri-Net for Semi-Supervised Deep Learning. In Proc. 27th
International Joint Conference on Artificial Intelligence, 2014–20. Stockholm, Sweden: AAAI.
Chen, H., Chiang, R. HL, and Storey, V. C. 2012. Business Intelligence and Analytics: From Big Data to Big Impact. MIS Quarterly
36 (4): 1165–88.
Cheng J., Zhao S., Zhang J., King I., Zhang X., and Wang H. 2017. Aspect-level Sentiment Classification with Heat (Hierarchical
Attention) Network. In Proc. ACM Conf. on Information and Knowledge Management, 97-106.
45
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
Chu, C. I., Chatterjee, B., and Brown, A. 2013. The Current Status of Greenhouse Gas Reporting by Chinese Companies,
Managerial Auditing Journal 28 (2): 114–39.
Cobb-Clark, D., and Schurer, S. 2012. The Stability of Big-Five Personality Traits. Economics Letters 115: 11–15.
Crayne, M. P., and Medeiros, K. E. 2020. Making Sense of Crisis: Charismatic, Ideological, and Pragmatic Leadership in Response
to Covid-19. The American Psychologist.
DeSalvo, K. B., Wang, Y. C., Harris, A., Auerbach, J., Koo, D., and O’Carroll, P. 2017. Public Health 3.0: A Call to Action for
Public Health to Meet the Challenges of the 21st Century. Preventing Chronic Disease, 14.
Devaraj, S., Easley, R.F. and Crant, J.M. 2008. Research note—how does personality matter? Relating the five-factor model to
technology acceptance and use. Information Systems Research 19(1): 93-105.
Devlin, J., Chang, M., Lee, K., and Toutanova, K.. 2019. BERT: Pre-Training of Deep Bidirectional Transformers for Language
Understanding. In Proc. 2019 Conference of the NAACL-HLT, 4171–4186.
Dubofsky, P., and Varadarajan, P. R. 1987. Diversification and Measures of Performance: Additional Empirical Evidence.
Academy of Management Journal 30 (3): 597–608.
Edunov, S., Ott, M., Auli, M., and Grangier, D. 2018. Understanding Back-Translation at Scale. In Proceedings of the 2018
Conference on Empirical Methods in Natural Language Processing, 489-500.
Farnadi, G., Zoghbi, S., Moens, M. F., and Cock, M. D. 2013. Recognising Personality Traits Using Facebook Status Updates. In
Seventh International AAAI Conference on Weblogs and Social Media, 14–18.
Feng, S., Wang, Y., Liu, L., Wang, D., and Yu, G. 2019. Attention Based Hierarchical Lstm Network for Context-Aware Microblog
Sentiment Classification. World Wide Web 22 (1): 59–81.
Friedman, H. S. 2000. Long‐term Relations of Personality and Health: Dynamisms, Mechanisms, Tropisms. Journal of Personality,
68(6), 1089-1107.
Friedman, H. S., and Kern, M. L. 2014. Personality, well-being, and health. Annual Review of Psychology, 65, 719-742.
Gal, Y., Islam, R., and Ghahramani, Z. 2017. Deep bayesian active learning with image data. In International Conference on
Machine Learning (pp. 1183-1192). PMLR.
Galassi A., Lippi M., and Torroni P. 2020. Attention in Natural Language Processing, IEEE Transactions on Neural Networks and
Learning Systems 1-18.
Gao, S., Ramanathan, A., and Tourassi, G.. 2018. Hierarchical Convolutional Attention Networks for Text Classification. In
Proceedings of the Third Workshop on Representation Learning for Nlp, 11–23.
Gill, A. J., and Oberlander, J.. 2003. Perception of E-Mail Personality at Zero-Acquaintance: Extraversion Takes Care of Itself;
Neuroticism Is a Worry. In Proc. of the Cognitive Science Society, 25:456–61. 25.
Gjurković M., Karan M., Vukojević I., Bošnjak M., and Šnajder J. 2021. PANDORA Talks: Personality and Demographics on
Reddit, Proceedings of the Ninth International ACL Workshop on Natural Language Processing for Social Media, 138–152.
Go, A., Bhayani, R., and Huang, L. 2009. Twitter Sentiment Classification Using Distant Supervision. CS224N Project, Stanford.
https://cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf.
Goldberg, L. R. 1990. An Alternative Description of Personality: The Big-Five Factor Structure. Journal of Personality and Social
Psychology 59 (6): 1216–29.
Gregor, S., and Hevner, A. 2013. Positioning and Presenting Design Science Research for Maximum Impact. MIS Quarterly 37
(2): 337–55.
Guan Z., Wu B., Wang B., and Liu H. 2020. Personality2vec: Network Representation Learning for Personality, In 2020 IEEE
Fifth International Conference on Data Science in Cyberspace (DSC) 30-37.
Guest, J. L., Rio, C. D., and Sanchez, T. 2020. The Three Steps Needed to End the Covid-19 Pandemic: Bold Public Health
Leadership, Rapid Innovations, and Courageous Political Will. JMIR Public Health 6 (2): e19043.
Gupta, S., Hanssens, D., Hardie, B., Kahn, W., Kumar, V., Lin, N., ... and Sriram, S. 2006. Modeling customer lifetime value.
Journal of Service Research, 9(2), 139-155.
Halliday, M. A. K., and Hasan, R. 2004. An Introduction to Functional Grammar, 3rd ed., revised by C. Matthiessen.
Hambrick, D. C. 2007. Upper Echelons Theory: An Update. Academy of Management Review 32 (2): 334–43.
Hambrick, D. C., and Mason, P. A. 1984. Upper Echelons: The Organization as a Reflection of Its Top Managers. Academy of
Management Review 9 (2): 193–206.
Hastie T., Tibshirani R., and Friedman J. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
Springer Science & Business Media.
Haussler D, Kearns M and Schapire RE 1994 Bounds on the sample complexity of Bayesian learning using information theory and
the VC dimension. Machine Learning (14): 83–113.
Heavey, C., Simsek, Z., Kyprianou, C., and Risius, M. 2020. How Do Strategic Leaders Engage with Social Media? A theoretical
framework for research and practice. Strategic Management Journal 41 (8): 1490–1527.
Henderson, A. D., Miller, D., and Hambrick, D. C. 2006. How Quickly Do CEOs Become Obsolete? Industry Dynamism, CEO
Tenure, and Company Performance. Strategic Management Journal 27 (5): 447–60.
Hevner, A., March, S., Park, J., and Ram, S. 2004. Design Science in IS Research, MIS Quarterly 28 (1): 75–105.
Hough, J. R., and Ogilvie, OT. 2005. An Empirical Test of Cognitive Style and Strategic Decision Outcomes. Journal of
Management Studies 42 (2): 417–48.
Howard, J., and Ruder, S. 2018. Fine-Tuned Language Models for Text Classification. CoRR abs/1801.06146.
Hrazdil, K., Novak, J., Rogo, R., Wiedman, C., and Zhang, R. 2020. Measuring executive personality using machine-learning
algorithms: A new approach and validation tests. Journal of Business Finance and Accounting.
46
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
Huang, A., Wang, H., and Yang, Y. 2020. FinBERT—A Deep Learning Approach to Extracting Textual Information, Available at
SSRN, 3910214.
Hutson, M. 2020. The mess behind the models: Too many of the COVID-19 models led policymakers astray. Here's how
tomorrow's models will get it right. IEEE Spectrum, 57(10), 30-35.
Iacobelli, F., Gill, A. J., Nowson, S., and Oberlander, J. 2011. Large Scale Personality Classification of Bloggers. In Affective
Computing and Intelligent Interaction, 568–77. Springer.
Jayaratne, M., and Jayatilleke, B. 2020. Predicting Personality using Answers to Open-ended Interview Questions, IEEE Access, 8,
115345-115355.
Jing, R. 2019. A Self-Attention Based LSTM Network for Text Classification. Journal of Physics, 1207:012008. 1.
John GH, Kohavi R, and Pfleger K (1994) Irrelevant features and the subset selection problem. In Proceedings of the Eleventh
International Conference on Machine Learning, 121-129.
Judge, T. A., Bono, J. E., Ilies, R., and Gerhardt, M. W. 2002. Personality and Leadership: A Qualitative and Quantitative Review.
Journal of Applied Psychology 87 (4): 765–80.
Judge, T. A., Piccolo, R. F., and Kosalka, T. 2009. The Bright and Dark Sides of Leader Traits: A Review and Theoretical Extension
of the Leader Trait Paradigm. The Leadership Quarterly 20 (6): 855–75.
Kim, Y., Jernite, Y., Sontag, D., and Rush, A. M. 2016. Character-Aware Neural Language Models. In Proceedings of the Thirtieth
Aaai Conference on Artificial Intelligence, 2741–9.
Laine S. and Aila T. 2016. Temporal Ensembling for Semi-supervised Learning. arXiv preprint arXiv:1610.02242.
Lalor, J. P., Yang, Y., Smith, K., Forsgren, N., and Abbasi, A. 2022. Benchmarking Intersectional Biases in NLP. In Proceedings
of the Association for Computational Linguistics.
Le, Q., and Mikolov, T. 2014. Distributed Representations of Sentences and Documents. In International Conference on Machine
Learning, 32:1188–96.
Lee D. H. 2013. Pseudo-label: The simple and Efficient Semi-supervised Learning Method for Deep Neural Networks, In ICML
Workshop on Challenges in Representation Learning.
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., and Kang, J. 2020. BioBERT: a pre-trained biomedical language
representation model for biomedical text mining, Bioinformatics, 36(4), 1234-1240.
Leonardi S., Monti D., Rizzo G., and Morisio, M. 2020. Multilingual Transformer-Based Personality Traits Estimation,
Information 11(4): 179.
LePine J. A. and Van Dyne L. 2001. Voice and cooperative behavior as contrasting forms of contextual performance: evidence of
differential relationships with big five personality characteristics and cognitive ability. Journal of applied psychology, 86(2):
326.
Li, M., and Simerly, R. L. 1998. The Moderating Effect of Environmental Dynamism on the Ownership and Performance
Relationship. Strategic Management Journal 19 (2): 169–79.
Li, J., Larsen, K., and Abbasi, A. 2020. TheoryOn: A Design Framework and System for Unlocking Behavioral Knowledge through
Ontology Learning, MIS Quarterly, 44(4), 1733-177.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... and Stoyanov, V. 2019. RoBERTa: A Robustly Optimized BERT
Pretraining Approach. arXiv preprint arXiv:1907.11692.
Liu Z., Wang, Y., Mahmud, J., Akkiraju, R., Schoudt, J., Xu, A., and Donovan, B. 2016. To Buy or Not to Buy? Understanding
the Role of Personality Traits in Predicting Consumer Behaviors. In Spiro E., Ahn YY. (eds) Social Informatics. Lecture
Notes in Computer Science, 10047: 337–346.
Lynn V., Balasubramanian N., and Schwartz H. A. 2020. Hierarchical Modeling for User Personality Prediction: The Role of
Message-Level Attention. In Proc. 58th Annual Meeting of the ACL, 5306–5316.
Mairesse, F., Walker, M. A., Mehl, M. R., and Moore, R. K. 2007. Using Linguistic Cues for the Automatic Recognition of
Personality in Conversation and Text. Journal of Artificial Intelligence Research 30: 457–500.
Majumder, N., Poria, S., Gelbukh, A., and Cambria, E. 2017. Deep Learning-Based Document Modeling for Personality Detection
from Text. IEEE Intelligent Systems 32 (2): 74–79.
Masli, A., Richardson, V. J., Watson, M. W., and Zmud, R. W. 2016. Senior Executives’ IT Management Responsibilities: Serious
IT-Related Deficiencies and CEO/CFO Turnover. MIS Quarterly 40 (3): 687–708.
Medcof, J. W. 2007. CTO Power. Research-Technology Management 50 (4): 23–31.
Mehl, M. R., Gosling, S. D., and Pennebaker, J. W. 2006. Personality in Its Natural Habitat: Manifestations and Implicit Folk
Theories of Personality in Daily Life. Journal of Personality and Social Psychology 90 (5): 862.
Mehta, Y., Majumder, N., Gelbukh, A., and Cambria, E. 2020. Recent Trends in Deep Learning Based Personality Detection,
Artificial Intelligence Review, 53(4), 2313-2339.
Mohamed, N., Ahmad, M. H., Ismail, Z., and others. 2010. Short Term Load Forecasting Using Double Seasonal Arima Model. In
Proceedings of the Regional Conference on Statistical Sciences, 10:57–73.
Murphy, J., Vallières, F., Bentall, R. P., Shevlin, M., McBride, O., Hartman, T. K., ... and Hyland, P. 2021. Psychological
characteristics associated with COVID-19 vaccine hesitancy and resistance in Ireland and the United Kingdom. Nature
Communications, 12(1), 1-15.
Nadkarni, S., and Herrmann, POL. 2010. CEO Personality, Strategic Flexibility, and Firm Performance: The Case of the Indian
Business Process Outsourcing Industry. Academy of Management Journal 53 (5): 1050–73.
Nygren, T. E., and White, R. J. 2005. Relating Decision Making Styles to Predicting Selfefficacy and a Generalized Expectation
of Success and Failure. In Proc. Human Factors and Ergonomics Society Meeting, 432–34.
47
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022
Pan, S. J., and Yang, Q. 2009. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22 (10):
1345–59.
Parrish Jr, J. L., Bailey, J. L., and Courtney, J. F. 2009. A Personality Based Model for Determining Susceptibility to Phishing
Attacks. Little Rock: University of Arkansas, 285-296.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., and Blondel, M. 2011. Scikit-Learn: Machine
Learning in Python. Journal of Machine Learning Research 12: 2825–30.
Pennebaker, J. W., and King, L. A. 1999. Linguistic Styles: Language Use as an Individual Difference. Journal of Personality and
Social Psychology 77 (6): 1296.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. 2018. Deep Contextualized Word
Representations. In Proceedings of NAACL-HLT, 2227–37.
Peterson, R. S., Smith, D. B., Martorana, P. V., and Owens, P. D. 2003. The Impact of Chief Executive Officer Personality on Top
Management Team Dynamics: One Mechanism by Which Leadership Affects Organizational Performance. Journal of
Applied Psychology 88 (5): 795–808.
Pratama, B. Y., and Sarno, R. 2015. Personality Classification Based on Twitter Text Using Naive Bayes, KNN and SVM. In The
IEEE International Conference on Data and Software Engineering (Icodse), 170–74. IEEE.
Prechelt, L. 1998. Automatic Early Stopping Using Cross Validation, Neural Networks 11 (4): 761–67.
Riaz, M. N., Riaz, M. A., and Batool, N. 2012. Personality Types as Predictors of Decision Making Styles. Journal of Behavioural
Sciences 22 (2): 99–114.
Ryan, J., and Herleman, H. 2015. A Big Data Platform for Workforce Analytics. In Big Data at Work: The Data Science Revolution
and Organizational Psychology, Chapter 2, 19-42.
Shi, Z., Lee, G. M., and Whinston, A. B. 2016. Toward a Better Measure of Business Proximity: Topic Modeling for Industry
Intelligence. MIS Quarterly 40 (4): 1035–56.
Shmueli, G. and Koppius, O. 2011. Predictive Analytics in Information Systems Research. MIS Quarterly, 553–72.
Sun X., Liu B., Cao J., Luo J., and Shen X. 2018. Who am I? Personality Detection Based on Deep Learning for Texts, IEEE
International Conference on Communications: 1–6.
Tadesse, M. M., Lin, H., Xu, B., and Yang, L. 2018. Personality Predictions Based on User Behavior on the Facebook Social Media
Platform. IEEE Access 6: 61959–69.
Tausczik, Y. R., and Pennebaker, J. W. 2010. The Psychological Meaning of Words: LIWC and Computerized Text Analysis
Methods. Journal of Language and Social Psychology 29 (1): 24–54.
Torrey, L., and Shavlik, J. 2010. Transfer Learning. In Handbook of Research on Machine Learning Applications and Trends:
Algorithms, Methods, and Techniques, 242–64. IGI Global.
Vinciarelli, A., and Mo, G. 2014. Survey of Personality Computing. IEEE Trans. Affective Computing 5: 273–91.
Walls, J. G., Widmeyer, G. R., and El Sawy, O. A. 1992. Building an Information System Design Theory for Vigilant
EIS, Information Systems Research, 3(1), 36-59.
Wang, Q., Lau, R. Y. K., and Xie, H. 2021. The Impact of Social Executives on Firms’ Mergers and Acquisitions Strategies: A
Difference-in-Difference Analysis. Journal of Business Research 123: 343–354.
Wang Z., Wu C. H., Li Q. B., Yan B., and Zheng, K. F. 2020. Encoding Text Information with Graph Convolutional Networks for
Personality Recognition, Applied Sciences (Switzerland) 10(12): 4081.
Wang Z., Wu C., Zheng K., Niu X., and Wang X. 2019. SMOTETomek-Based Resampling for Personality Recognition. IEEE
Access 7: 129678–129689.
Wang Y., Chen Q., Ahmed M., Li Z., Pan W., and Liu H. 2019. Joint Inference for Aspect-level Sentiment Analysis by Deep
Neural Networks and Linguistic Hints, IEEE Transactions on Knowledge and Data Engineering 1-12.
Weng, P. S., and Chen, W. Y. 2017. Doing Good or Choosing Well? Corporate Reputation, CEO Reputation, and Corporate
Financial Performance. The North American Journal of Economics and Finance 39: 223–40.
Wilcoxon, F. 1992. Individual Comparisons by Ranking Methods. Breakthroughs in Statistics, 196–202. Springer.
Wright, W. R., and Chin, D. N. 2014. Personality Profiling from Text: Introducing Part-of-Speech N-Grams. In International
Conference on User Modeling, Adaptation, and Personalization, 243–53. Springer.
Xie Q., Dai Z., Hovy E., Luong M. T., and Le Q. V. 2020. Unsupervised Data Augmentation for Consistency Training, In
Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS).
Xue D., Wu L., Hong Z., Guo S., Gao L., Wu Z., Zhong X., and Sun J. 2018. Deep Learning-based Personality Recognition from
Text Posts of Online Social Networks, Applied Intelligence 48(11): 4232–4246.
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., and Hovy, E. 2016. Hierarchical Attention Networks for Document Classification.
In Proc. of the 2016 Conference of the NAACL: Human Language Technologies, 1480–9.
Yu, J., and Markov, K. 2017. Deep Learning Based Personality Recognition from Facebook Status Updates. In Proceedings of the
8th IEEE International Conference on Awareness Science and Technology (iCAST), 383–87.
Zhou J., Huang J. X., Chen Q., Hu Q. V., Wang T., and He L. 2019. Deep Learning for Aspect-level Sentiment Classification:
Survey, Vision, and Challenges. IEEE Access (7): 78454-78483.
Zimbra, D., Abbasi, A., Zeng, D., and Chen, H. 2018. The State-of-the-art in Twitter Sentiment Analysis: A Review and Benchmark
Evaluation, ACM Transactions on Management Information Systems (TMIS), 9(2), 1-29.
48