0% found this document useful (0 votes)
14 views49 pages

Personality

Uploaded by

Tram Anh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views49 pages

Personality

Uploaded by

Tram Anh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/358396484

Getting Personal: A Deep Learning Artifact for Text-Based Measurement of


Personality

Article in Information Systems Research · February 2022


DOI: 10.1287/isre.2022.1111

CITATIONS READS
43 345

3 authors, including:

Ahmed Abbasi
University of Notre Dame
100 PUBLICATIONS 6,584 CITATIONS

SEE PROFILE

All content following this page was uploaded by Ahmed Abbasi on 06 February 2022.

The user has requested enhancement of the downloaded file.


Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

Getting Personal: A Deep Learning Artifact for Text-based Measurement of Personality

Kai Yang, Raymond Y. K. Lau, and Ahmed Abbasi

Abstract
Analysts, managers, and policymakers are interested in predictive analytics capable of offering better foresight. It is
generally accepted that in forecasting scenarios involving organizational policies or consumer decision-making,
personal characteristics – including personality – may be an important predictor of downstream outcomes. The
inclusion of personality features in forecasting models has been hindered by the fact that traditional measurement
mechanisms are often infeasible. Text-based personality detection has garnered attention due to the public availability
of digital textual traces. However, the text machine learning space has bifurcated into two branches: feature-based
methods relying on manually crafted human intuition, or deep learning language models that leverage big data and
compute – the main commonality being that neither branch generates accurate personality assessments, thereby
making personality measures infeasible for downstream forecasting applications. In this study, we propose
DeepPerson, a design artifact for text-based personality detection that bridges these two branches by leveraging
concepts from relevant psycholinguistic theories in conjunction with advanced deep learning strategies. DeepPerson
incorporates novel transfer learning and hierarchical attention network methods that employ psychological concepts
and data augmentation in conjunction with person-level linguistic information. We evaluate the utility of the proposed
artifact using an extensive design evaluation on three personality data sets, in comparison with state-of-the-art methods
proposed in academia and industry. DeepPerson is able to improve detection of personality dimensions by 10 to 20
percentage points relative to the best comparison methods. Using case studies in the finance and health domains, we
show that more accurate text-based personality detection can translate into significant improvements in downstream
applications such as forecasting future firm performance or predicting pandemic infection rates. Our findings have
important implications for research at the intersection of design and data science, and practical implications for
managers focused on enabling, producing, or consuming predictive analytics.

Keywords: Personality Text Mining, Predictive Analytics, Deep Learning, Design Science, NLP, Psychometrics

1 Introduction

We live in an era of great socio-economic uncertainty. At the same time, datafication, democratization,

consumerization, and the ubiquity of social media have created a seemingly insatiable appetite for real-time

analysis, insights, forecasts, and scrutiny of organizational policies, decisions, and performance. Across

time zones, industry sectors, and professions, everyone from financial analysts and epidemiologists to

policy makers and think tanks are interested in better insight and foresight. As part of this global sense-

making narrative during turbulent times, the importance of styles and traits has once again come front and

center (Crayne and Medeiros 2020; Guest et al. 2020). Personality traits affect life choices, business

decisions, suitability for certain jobs, health and well-being, protective behaviors, and numerous other

preferences (Goldberg 1990; Majumder et al. 2017; Wang et al. 2019b). This is true for top-level

management at publicly traded companies (Hambrick and Mason 1984; Hambrick 2007), political leaders

of national and state-level governments (Crayne and Medeiros 2020), everyday online consumers

Acknowledgements: This work was funded in part through U.S. NSF grant IIS-2039915 and Oracle for Research
1
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

(Adamopoulos et al. 2018), and employees adopting new technologies (Devaraj et al. 2008) or seeking to

avoid phishing attacks (Parrish et al. 2009). Simply put, automated personality detection can provide rich

predictors that can enhance agility and foresight in an array of downstream predictive analytics applications.

For instance, previous empirical studies have shown that executives’ personality traits influence their

decision-making (Nadkarni and Herrmann 2010; Riaz, Riaz, and Batool 2012) and leadership styles (Judge

et al. 2002; Judge, Piccolo, and Kosalka 2009). These studies underscore the possible relation between

leaders’ personalities and strategic and tactical organizational decision-making – with implications for

financial forecasting of firm policies and performance (Peterson et al. 2003). In human resource contexts,

personality measures could predict a candidate’s suitability for a particular job role and/or teamwork

performance (LePine and Van Dyne 2001). In digital marketing and online personalization settings,

personality can inform product/music recommendations and effectiveness of word-of-mouth (Celli et al.

2013; Farnadi et al. 2013; Adamopoulos et al. 2018). Personality is a type of psychometric dimension –

psychometrics are constructs related to attitudes, traits, and beliefs. In the management, marketing, and

information systems (IS) literature, the Big Five personality traits (Goldberg 1990) have been used to

examine the impact of personality on various outcomes (Devaraj et al. 2008). Like other psychometric

dimensions, one obstacle to larger-scale empirical analysis or predictive modeling using personality is that

traditional measurement methods – namely, surveys or manual coding of text - are often invasive and

infeasible at scale (Peterson et al. 2003; Ahmad et al. 2020a; Hambrick 2007; Crayne and Medeiros 2020).

Given the difficulties in obtaining traditional psychometric data (Hambrick 2007), natural language

processing (NLP) methods may represent an alternative mechanism for measuring personality through user-

generated content (Ahmad et al. 2020a). However, the text machine learning (ML) space has bifurcated

into two branches: feature-based machine learning relying largely on manually crafted human intuition

(Pratama and Sarno 2015; Tadesse et al. 2018), or deep learning language models relying heavily on big

data and compute (Majumder et al. 2017; Yu and Markov 2017). The main commonality between the two

being that neither branch generates accurate personality assessments, thereby making such measures

infeasible for downstream analytics and policy applications. Accordingly, the research objective of this

2
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

study is to develop a design artifact for text-based personality detection that bridges the schism by

leveraging concepts from relevant psycholinguistic theories in conjunction with advanced deep learning

strategies.

Following the design science approach (Gregor and Hevner 2013; Hevner et al. 2004), we use a kernel

theory from psycholinguistics to develop a robust middle-ground framework called DeepPerson that

couples principled, domain-adapted NLP artifacts (i.e., embeddings, encoders, and attention networks) with

state-of-the-art end-to-end deep learning concepts for enhanced predictive power. Design science research

questions typically center on the efficacy of design elements within a proposed artifact (Abbasi and Chen

2008) and how the artifact can “increase some measure of operational utility” (Gregor and Hevner 2013; p.

343). Accordingly, our research questions focus on personality detection capabilities and the downstream

implications of better text-based personality measurement.

RQ1: Relative to existing NLP methods, how effectively can DeepPerson detect personality
dimensions from user-generated text?

RQ2: Can enhanced personality measurement significantly improve downstream forecasting


outcomes?

To answer these questions, we performed two sets of evaluation. In the first, we examined the personality

detection capabilities of DeepPerson and comparison methods. Results reveal that our framework allows

markedly more accurate detection of personality factors from text relative to existing methods developed

in academia and industry, including 10% to 30% improvements over IBM Personality Insights (Liu et al.

2016), Google BERT (Devlin et al. 2019), and Facebook’s RoBERTa (Liu et al. 2019). More importantly,

our second evaluation involving two case studies shows that this enhanced performance translates into

personality variables that can significantly improve forecasting capabilities in finance and health contexts.

The main contributions of our work are three-fold. First, we propose a novel framework for measuring

personality from text. Second, as part of our framework, we design novel transfer learning and hierarchical

attention network methods. The proposed self-taught personality detection fine-tuning (SPDFiT) method

can overcome the labeled data bottleneck encountered in most psychometric NLP problems by generating

numerous pseudo-labeled training examples to enhance end-to-end model training. The word-layer-person

3
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

hierarchical attention network (wlpHAN) uses word and concept layer embeddings coupled with person-

level embeddings to capture key personality cues appearing in text. Third, using a two-part evaluation, we

show that more accurate NLP-based personality detection can translate into significant improvements in

downstream predictive analytics applications such as forecasting future firm performance or predicting

pandemic infection rates. Most notably, as we demonstrate in our evaluation, this is not the case for state-

of-the-art methods which are generally incapable of producing meaningful text-based personality measures.

Our work has important implications for IS research – we believe NLP at the intersection of design and

data science represents a critical opportunity to develop novel, impactful artifacts that amalgamate socio-

technical concepts (Abbasi et al. 2016). Furthermore, our work has practical implications for managers

focused on enabling, producing, or consuming analytics in a broad array of contexts where the inclusion of

personality information for key decision or policy-makers may facilitate enhanced insight and foresight.

The remainder of the article is organized as follows. In the ensuing section, we discuss prior work on

personality, describe state-of-the-art NLP methods for personality detection, and introduce key research

gaps. In section 3, we introduce our proposed framework, using a design science approach. Section 4

presents evaluation results for our framework relative to existing NLP methods. Section 5 uses an empirical

case study to demonstrate the downstream value proposition of enhanced personality measurement,

afforded by our proposed design artifact, for two important forecasting problems in the finance and health

domains. The implications of our work, and concluding remarks, appear in Section 6.

2 Related Work

2.1 The Importance of Measuring Personality

Prior IS research has studied the importance of personality. It has been shown to influence technology

adoption (Devaraj et al. 2008) and impact online word-of-mouth (Adamopoulos et al. 2018). Personality

traits can also impact susceptibility to phishing attacks (Parrish et al. 2009) and influence how users react

to online recommendations (Celli et al. 2013). Majumder et al. (2017) define personality as the combination

of personal behavior, motivation, and thought-patterns. In the field of psychology, the Big Five personality

4
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

traits (often called the five-factor model) have been widely used to characterize individuals’ personalities

with respect to five dimensions (Goldberg 1990):

1. extroversion (EXT): attention-seeking, sociable, playful versus introversion (e.g., shy)


2. neuroticism (NEU): helplessness, depressive, anxious versus emotional stability (e.g., calm)
3. agreeableness (AGR): friendly, cooperative versus disagreeableness (e.g., suspicious)
4. conscientiousness (CON): self-disciplined versus unconscientiousness (e.g., rash, careless)
5. openness (OPN): creative, imaginative, insightful versus conservatism (e.g., unimaginative).

Unlike human emotions, individuals’ personalities have been found to be relatively stable over time

(Cobb-Clark and Schurer 2012), generally unaffected by adverse events. In studies focused on senior

executives, personality traits have been found to influence decision-making style (Nadkarni and Herrmann

2010; Riaz et al. 2012). For instance, (Riaz et al. 2012) suggested that extroversion was positively associated

with a spontaneous decision-making style, while openness was related to intuitive decision-making. The

relation between agreeableness or conscientiousness and decision-making style has also been examined

(Nygren and White 2005). Other studies have explored the relationship between personality and rational

decision-making (Hough and Ogilvie 2005). As one example, extroversion has been associated with

effective leadership (Judge et al. 2002) and transformational leadership (Judge, Piccolo, and Kosalka 2009).

Research has also linked the Big Five personality dimensions to downstream implications – (Peterson et al.

2003) conducted one of the first studies that examined the relationship between CEOs’ personality traits

and firm performance using a small sample of personality information elicited from 17 executives.

It is worth noting that research examining causal relations related to personalities and outcomes have,

in certain circumstances, encountered questions related to reverse causality (Hambrick 2007). For instance,

certain types of personalities might be more conducive to being appointed or elected into leadership roles,

or more indicative of the strategic directions that a particular organization wished to take (Hambrick 2007).

While these concerns are well-founded in causal modeling contexts, they do not lessen the potential value

proposition of measuring personalities, or of incorporating such measures in predictive contexts. Prior IS

research has carefully delineated between prediction and explanation (Shmueli and Koppius 2011). As our

evaluation results presented in section 5 and Appendix C reveal, personality dimensions are significant and

5
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

powerful predictors of future outcomes with performance/policy implications. For prediction contexts, this

simply means that the underlying mechanisms contributing to their viability as key predictors of future

downstream outcomes might encompass personal, organizational, contextual, or environmental factors.

The bigger limitation for use of personality dimensions in prediction contexts has been the paucity of

available psychometric data (Ahmad et al. 2020a). Traditional survey and manual annotation techniques

are time-consuming and not well-suited for large-scale prediction (Hambrick 2007; Crayne and Medeiros

2020). However, with the growth of online user generated content, there is a wealth of social media, online

reviews, and public health 3.0 content. In the context of personality and leadership, social executives (Wang

et al. 2021) are increasingly communicating with key stakeholders through social media (Heavey et al.

2020). NLP methods applied to such social media text represents a viable approach for measuring

personality dimensions (Back et al. 2010; Tadesse et al. 2018). This research avenue is also consistent with

the perspective espoused by prior IS design science work related to business analytics, which has called for

design artifacts related to text and social media (Chen et al. 2012; Abbasi et al. 2018). In the following

section, we discuss the limitations of current automated NLP efforts related to personality mining from text.

2.2 Automated NLP-based Personality Detection

Automated NLP research focusing on text categorization problems can we broadly grouped into two areas:

manual feature engineering approaches and deep learning methods that leverage big data and/or extensive

compute. Although prior work on automated text-based personality detection has focused more on feature-

based techniques, as we discuss below, both categories of methods offer complementary advantages.

Researchers have examined various linguistic features for detecting individuals’ personality traits.

These features were generally coupled with ML classifiers such as multinomial Naive Bayes (MNB), k-

nearest neighbors (KNN), support vector machines (SVM), and gradient boosted trees (Pratama and Sarno

2015; Tadesse et al. 2018). For instance, Gill and Oberlander (2003) observed that individuals with the

openness trait tend to use words related to insight, while those with the neuroticism tendency are more

likely to use concrete and common words when composing messages. The neuroticism trait has also been

associated with usage of words with negative appraisal and affect (Mairesse et al. 2007). Mehl et al. (2006)

6
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

found that men with the conscientiousness trait tended to use more filler words, while the same did not hold

true for females. The syntactic patterns of messages have also been found to contain important personality

cues (Mairesse et al. 2007). Automated feature-based detection methods have attempted to leverage these

manually inferred insights, and related lexicons, as feature-based inputs for machine learning (ML)

classifiers. For example, the Linguistic Inquiry and Word Count (LIWC) and Research Council

Psycholinguistics Database (MRC) lexicons have been used in prior work geared towards automated ML-

based scoring of social media text (Farnadi et al. 2013; Tausczik and Pennebaker 2010; Vinciarelli and

Mohammadi 2014; Adamopoulos et al. 2018). In addition to lexicons, bag-of-word and part-of-speech tag

n-grams have also been used to detect personality traits (Wright and Chin 2014; Pratama and Sarno 2015).

Tadesse et al. (2018) used structured programming for linguistic cue extraction (SPLICE), encompassing

sentiment, readability, and self-evaluation features, to detect individuals’ personalities. The predictive

power of such linguistic features could be bootstrapped by resampling methods such as like synthetic

minority oversampling (SMOTE) (Wang et al. 2019a). Guan et al. (2020) proposed a Personality2Vec

model in which they ran random walks over user content similarity graphs defined using cosine similarity

applied to LIWC category vectors of users’ text.

Recently, deep learning-based methods have been employed to detect individuals’ personality traits

based on their social media posts (Agastya et al. 2019; Ahmad et al. 2020b; Leonardi et al. 2020). In

particular, it was found that deep CNNs outperformed classical machine learning classifiers in personality

detection (Majumder et al. 2017; Yu and Markov 2017; Sun et al. 2018). The main advantages of deep

CNNs are that they can utilize word embeddings to capture richer contextual information appearing in

documents, thereby allowing the models to generate rich abstract representations of documents. For

personality detection, these capabilities have been further enhanced by combining CNNs with attention

networks. For instance, Xue et al. (2018) exploited word-level attention by aggregating the embeddings of

words surrounding a target word, whereas Lynn et al. (2018) applied word- and message-level attention. A

limitation of the use of learned word embeddings coupled with generic attention-based CNNs, GCNs, and

LSTMs in the personality detection space has been their inability to capture linguistic cues manifesting at

7
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

different granularities including person-level characteristics, psychological concepts, syntactic and word-

level patterns.

2.3 Related NLP Methods: Language Models, Transfer Learning, and Attention

In essence, deep learning has shifted the NLP model-building paradigm from manually weighting low-level

linguistic features to automated learning of semantic and syntactic representations. Pre-trained, general-

purpose language models that attempt to learn broad linguistic patterns and relations applicable to an array

of text categorization tasks epitomize this shift. These models leverage the classic concept of transfer

learning – improving classification performance for a target task in a target domain by acquiring prior

classification knowledge from one or more source tasks in corresponding source domains (Pan and Yang

2009; Torrey and Shavlik 2010). Deep learning has taken transfer learning to a new level, allowing larger

models (millions of parameters) trained on larger source data (millions of general-purpose documents).

Examples include universal language models such as ULMFiT (Howard and Ruder 2018), deep

contextualized representations such as ELMo (Peters et al. 2018), and powerful transformers capable of

learning longer sequential patterns, such as BERT (Devlin et al. 2019). ULMFiT uses inductive transfer

learning to fine-tune the learning rates at different layers of a deep recurrent neural network (RNN) for

enhanced NLP classification (Howard and Ruder 2018). ELMo utilizes different levels of abstraction

knowledge captured at various layers of a deep Bi-LSTM to boost performance (Peters et al. 2018).

Similarly, BERT (Devlin et al. 2019) transfers prior knowledge (based on source data) to the bottom layers

of a deep transformer network, and then allows the top layers to be fine-tuned using a small number of

labelled training examples from the target domain and task. Recently, Leonardi et al. (2020) performed

text-based personality detection using the BERT transformer embeddings as input for a basic multi-layer

neural network. A more common domain-adaptation strategy has been to further pre-train BERT models

on task-specific corpora (unsupervised) before fine-tuning on the supervised training data (since the original

model was trained on Wikipedia and BookCorpus). For instance, BioBERT further pre-trained the BERT-

Base model on billions of tokens from PubMed articles (Lee et al. 2020), whereas SciBERT did the same

on over a million computer science and biomedical papers from Semantic Scholar (Beltagy et al. 2019).

8
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

FinBERT is further pre-trained on corporate filings, financial analyst reports, and earnings conference call

transcripts (Huang et al. 2020). In our evaluation section, we also include a BERT model further pre-trained

on data more closely aligned with personality detection (we call this benchmark method PersonaBERT).

Apart from pre-training language models, another transfer learning approach is to fine-tune deep

learning models using data augmentation methods (Lee 2013; Laine and Aila 2016; Xie et al. 2020).

Examples include unsupervised data augmentation (UDA) (Xie et al. 2019) and Self-Ensembling (Laine

and Aila 2016). These methods utilize consistency regularization to avoid disruption from the data

augmentation process. A limitation of pseudo-labeling methods in general has been the quality of data

generated – which often produces noisy signals that offset the predictive power gains (Lee 2013). This issue

can certainly come into play on social media and user-generated text, where data quality is often lower.

A related machine learning advancement of interest to personality detection has been attention

mechanisms. As noted, some prior personality detection methods have used basic one dimensional

attention, such as AttRCNN (Xue et al. 2018), which uses exploited word-level attention by aggregating

the embeddings of words surrounding a target word. The aspect-oriented sentiment analysis literature has

also used one-dimensional aspect attention for words within a phrase surrounding opinion source/target

keywords, including aspect-aware functions (Zhou et al. 2019) such as dot-product, concat, and general

attention. Recognizing that for many tasks, text patterns manifest at the message versus word levels, the

state-of-the-art has been hierarchical attention networks (HAN) and self-attention based extensions such as

hierarchical convolutional attention networks (HCAN) (Gao et al. 2018). Msg-Attn (Lynn et al. 2018)

approach employs word- and message-level attention for personality detection. However, personality is a

person-centric trait manifesting collectively in terms of the psychological concepts conveyed (Goldberg

1990; Cobb-Clark and Schurer 2012). Existing attention mechanisms ignore key person-level information

and the organic concept construct, instead focusing on the more arbitrary “message” unit of information).

2.4 Limitations of Current Personality Detection and General NLP Methods

The performance of existing machine-learning-based automated personality detection methods has been

inadequate. Gjurković et al. (2021) observed that feature-based text classification methods’ predictions

9
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

often had correlation rates of under 0.2 with gold-standard Big Five traits. Accuracies for industry-leading

personality detectors such as IBM personality insights have been observed to be equally low (Jayaratne and

Jayatilleke 2020). Similarly, a recent survey found that deep learning-based methods attained mean

accuracies of 58-63% when detecting Big Five traits from text (Mehta et al. 2020). They acknowledge this

poor performance as a bottleneck for downstream use and utility of automated detection methods (Mehta

et al. 2020; p. 2333-2334), noting “If an individual’s personality could be predicted with a little more

reliability, there is scope for integrating personality detection in almost all agents dealing with human-

machine interactions such as voice assistants, robots, cars, etc.”

We believe the issue is one of representational richness – effective personality detection necessitates

machine learning with enhanced expressive power. There is a need to include rich psychological concepts,

methods to capture patterns at different granularities, and techniques for overcoming limitations in available

psychological/directional training data for individuals. In order to illustrate this limitation in the state-of-

the-art, Table 1 summarizes existing methods covered in sections 2.2-2.3 in terms of four important

dimensions: the type of method, the language representations, use of attention mechanisms, and transfer

learning. In some respects, existing methods are limited by the Goldilocks principle – each type of method

generally does well on one of these dimensions, resulting in a smorgasbord of opportunities and limitations.

Feature-based methods use rich, domain-specific lexicons, but are limited in the extensiveness of patterns

learned due to reliance on feature-based machine learning classifiers. Deep learning personality detectors

use more robust sequential, spatial, and convolutional representational learning, even incorporating basic

attention, but lack inclusion of rich psychological concepts, multi-level attention, person-centric patterns,

or transfer learning. Language models use powerful self-attention, but do not consider patterns at different

granularities and are designed for standard word tokens. Relevant hierarchical/aspect attention use general

word embeddings, do not go beyond word-sentence-message level attention, and have typically not been

used in conjunction with transfer learning. Similarly, relevant transfer learning methods have their

limitations, namely learning from noisy data such as user-generated social media (Lee 2013). However,

integrating psycho-linguistic concepts, state-of-the art deep learning artifacts for multi-granularity

10
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

patterns/attention, and personality-appropriate transfer learning is non-trivial. As one example, even IBM

moved away from LIWC in recent years towards GloVe word embeddings (Jayaratne and Jayatilleke 2020,

p.115347), noting “Earlier versions of the service used the LIWC psycholinguistic dictionary with its

machine-learning model. However, the open-vocabulary approach outperforms the LIWC-based model.”

Category Example Papers/Methods Linguistic Attention Transfer Learning


Representations Mechanisms
Feature- IBM (Liu et al. 2016) LIWC category No use of attention. No use of transfer
based KNN (Farnadi et al. 2013) feature vectors, Feature or random learning. All patterns are
Personality SVM (Wright and Chin 2014) GloVe word walk patterns are learned on task-specific
Detection XGBoost (Tadesse et al. 2018) embeddings, or learned. training data.
SMOTETomek (Wang et al. 2019a) learned n-grams.
Personality2Vec (Guan et al. 2020)
Deep CNN-1 (Majumder et al. 2017) One-hot GRUs and LSTMs use No use of transfer
Learning CNN-2 (Yu and Markov 2017) representations of gates for retention. learning. All patterns are
for GRU (Yu and Markov 2017) words, word2vec AttRCNN inputs word learned on task-specific
Personality LSTM+CNN (Sun et al. 2018) applied to training embeddings into training data.
Detection AttRCNN (Xue et al. 2018) data, or pre-trained GRUs with attention
Msg-Attn (Lynn et al. 2020) GloVe word layers. Msg-Attn uses
GCN (Wang et al. 2020) embeddings. word and message-
level attention.
Language BERT (Devlin et al. 2019) Contextualized BERT models use bi- BERT uses 3.3 million
Models Domain-adapted BERT (Lee et al. word embeddings directional self- tokens from
2021; Beltagy et al. 2019; Huang et al. learned via attention with multi- BooksCorpus and
2020) transformer headed attention Wikipedia. Domain-
BERT+NN (Leonardi et al. 2020) encoders. adapted BERTs such as
SciBERT, BioBERT, and
FinBERT are pre-trained
on task-specific corpora.
Hierarchical HAN (Yang et al. 2016) Word and sentence Either word and No use of transfer
and Aspect HCAN (Gao et al. 2018) embeddings learned sentence, word and learning. All patterns are
Attention SATT-LSTM (Jing 2019) from text. message, aspect, or learned on task-specific
Aspect Attention (Zhou et al. 2019) self-attention. training data.
Transfer Self-Ensembling (Laine and Aila Word or Not explored. Transfer learning
Learning 2016) contextualized methods that can
UDA (Xie et al. 2020) embeddings. generate pseudo-labels.

Table 1 Strengths and Limitations of Prior Personality Detection and Related NLP Methods

The IS discipline has a rich history of design research utilizing concepts from language,

communication, and psychology (Woo 2001; Lytinen 1985), including machine learning work geared

towards NLP artifacts (Abbasi and Chen 2008; Abbasi et al. 2018; Li et al. 2020). There is no question that

recent advancements in deep learning, namely language models driven by transformers (Devlin et al. 2019),

have disrupted NLP design research. In essence, the domain-adapted feature engineering paradigm that was

pervasive for many years in text categorization studies – where researchers developed and applied carefully

constructed knowledge bases and lexical thesauri – has seemingly been rendered extinct by models capable

of employing millions, even billions of parameters tuned on massive text corpora (Brown et al. 2020).

11
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

However, we believe this demise has been grossly exaggerated. From a design science perspective, if we

define the effectiveness of an artifact based on its level of operational utility (Gregor and Hevner 2013),

neither existing feature and deep learning personality detectors or general-purpose language models are

well suited for text-based personality detection. As we later demonstrate, existing NLP methods in both

branches fail to produce personality measures that can improve downstream prediction outcomes. In fact,

we evaluate and markedly outperform every bolded method presented in Table 1. NLP artifacts are

inherently socio-technical, and opportunities for human-centered machine learning persist (Abbasi et al.

2016). There is a need to couple the power of state-of-the-art machine learning NLP methods with

principled, theory-driven domain adaptation. This is precisely the research gap we aim to address with our

proposed framework.

3 A Deep Learning Framework for Personality Detection

Many prior design science studies have used kernel theories to guide the design of novel artifacts (Li et al.

2020). According to Walls et al. (1992), kernel theories are derived from the natural and social sciences

and are used to govern meta-requirements. Arazy et al. (2010) stated that theories from those domains are

rarely used as-is because their scope and granularity are often inadequate for a specific design problem. As

noted, a fundamental problem with the state-of-the-art for NLP-based personality detection is a lack of

representational richness. Existing manual feature engineering approaches lack the breadth of patterns

needed to effectively capture personality traces from text, whereas the deep learning-based language models

are better suited for learning general NLP patterns, but lack contextualization. By focusing on the meta-

functions of language, Systemic Functional Linguistic Theory (SFLT) provides a theoretical lens for how

to think about representational richness in language (Halliday 2004). SFLT, which has been used in prior

IS design work (Abbasi and Chen, 2008), argues that language encompasses three core meta-functions

(Halliday 2004): ideational, interpersonal, and textual. The ideational meta-function stems from the notion

that language provides a mechanism for describing “human experience,” including experiential and logical

ideas and concepts (Halliday 2004; p. 29). The interpersonal meta-function relates to “enacting our personal

12
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

and social relationships” – it is both interactive and personal. The textual meta-function focuses on “the

construction of text” as “an enabling or facilitating function” (Halliday 2004; p. 30).

SFLT-based Design Guidelines DeepPerson Middle-Ground Research Gaps Explored


Framework Component
Effectively representing the ideational Psychological Concept Encoder – The By combining manually crafted
meta-function of language entails encoder leverages well-established psychometric resources capable of
consideration for experiential and psychometric dictionaries, lexical thesauri, capturing experiential ideas related to
logical concepts conveyed in text. and carefully crafted self-evaluation psychological processes (e.g., affective,
features in conjunction with task/data cognitive, perceptual, and personal), with
specific learning through the “embeddings generic language models for logical
from language models” idea. concepts, DeepPerson can better represent
the ideational meta-function.
The ideational meta-function manifests Word-Layer-Person Hierarchical Hierarchical attention has received limited
at different levels, including word, Attention Network (wlpHAN) – The network focus in personality detection. Furthermore,
phrase, clause, sentence, and across uses multiple attention levels to capture prior hierarchical attention work has
sentences. personality cues appearing at various focused on word or sentence-level attention.
linguistic granularities including concept
and syntax patterns.
The interpersonal meta-function states Personal Embeddings – The Incorporating user level characteristics
that capturing person-specific aforementioned hierarchical attention across documents is important for
characteristics entails accounting for network also employs a person-level personality detection, but has received
speaker cues. embedding for measuring an individual’s limited attention in prior studies.
cues across documents.
The textual meta-function requires CNN Character Encoder – The encoder Character CNNs have been used in prior
consideration of character and captures spatial patterns at the character NLP studies (e.g., Ahmad et al. 2020a,
morpheme level patterns. level to account for symbols and informal including ones appearing in IS (Li et al.
language commonly used in online social 2020). Nevertheless, character encoders are
media. important to capture syntactic and
morphological patterns related to the textual
meta-function.
The three language meta-functions are Self-taught Personality Detection Fine- State-of the-art inductive transfer learning
instantiated through user-generated tuning (SPDFiT) – This inductive transfer methods do not include any domain-adapted
text via context, semantics, and learning method uses a novel domain labeling techniques, and consequently,
expression. Due to the richness of, and adapted pseudo-labeling data augmentation underperform on text-based personality
variance in language usage, limitations technique with an entropy-based quality detection tasks. Existing data augmentation
on available labeled data can impede metric to expand the available psychometric methods lack appropriate quality control
the ability to derive robust linguistic NLP training data in a high-fidelity manner. resulting in noisily generated data.
patterns.

Table 2 Design Guidelines for DeepPerson Framework

Table 2 shows how we use SFLT as a kernel theory to guide the design of DeepPerson, our middle-

ground framework that combines problem domain adapted design with advanced machine learning

techniques. Our main design intuition is that enhancing text-based personality detection necessitates

effective representation of the ideational, interpersonal, and textual meta-functions of language as they

relate to personality trait traces appearing in natural language. The middle-ground domain adaptation

happens as a result of incorporating psychological encoders, a proposed word-layer-person hierarchical

13
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

attention network (wlpHAN) that includes word, a broader text layer for syntax/semantics/concepts, and

person-level information, and our novel transfer learning method for learning robust personality traces.

Building on the design guidelines in Table 2, Figure 1 shows an overview of the proposed DeepPerson

framework, which includes three main components: CNN-LSTM, wlpHAN, and transfer learning via

SPDFiT. The CNN-LSTM network consists of a convolutional neural network (CNN)-based character

encoder and two multi-layer Bi-LSTM networks. The first Bi-LSTM takes the character encoder and word

CNN embeddings as input. This component is intended to capture language usage related to the logical

ideational (Word CNN) and textual (character encoder) meta-functions (Kim et al. 2016; Mairesse et al.

2007). The second Bi-LSTM incorporates the psychological concept encoder to capture personality traces

related to the experiential ideational facet (Pennebaker and King 1999).

Figure 1 The Overall Architecture of the Proposed Deep Learning Model

wlpHAN uses word and layer-level attention to capture personality cues appearing at various linguistic

granularities for better representation of the ideational meta-function of language. Moreover, since

14
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

personality traits are speaker-level constructs, wlpHAN also employs a person-level embedding for

measuring an individual’s cues across documents in order to better capture person-specific facets of the

interpersonal meta-function of language (from an SFLT perspective).

Finally, since rich psychometric dimensions such as personality traits entail careful examination of

context, semantics, lexicogrammar, and expression (Halliday 2004), limited training data can pose as a

bottleneck (Chen et al. 2018). Accordingly, we propose Self-taught Personality Detection Fine-tuning

(SPDFiT), a novel inductive transfer learning method that uses a domain adapted pseudo-labeling data

augmentation technique to expand available training data by employing massive unlabeled domain-specific

data to fine-tune the wlpHAN component. In other words, SPDFiT enables the transfer of domain-specific

knowledge from similar source problem domains to enhance the target task of personality detection.

Before delving into the detailed formulations and intuition behind CNN-LSTM, wlpHAN, and SPDFiT,

we present an example to illustrate the enhanced representational richness afforded by these key

components of DeepPerson. The wlpHAN component is able to weight syntactic and semantic elements

input by the CNN-LSTM at different layers of the attention network, as shown in Figure 2. The illustration

depicts the highly weighted elements for detecting the “extroversion” (EXT) and “unconscientiousness”

(UNCON) personality dimensions, from two tweets respectively, for the former U.S. president. An

individual with the “extroversion” personality trait tends to be attention-seeking, sociable, and playful –

while the “unconscientiousness” personality trait is often associated with being reckless and impulsive

(Goldberg 1990). By using wlpHAN (e.g., word and layer-level attention coupled with the personal

embeddings) in conjunction with the CNN-LSTM, our proposed framework can correctly detect these (and

other) personality trait “digital traces” manifesting in documents based on word usage, syntax/semantic

(synsem) usage, and psychological concepts (e.g., self-focus, positive emotion, affect, and social process).

While SPDFiT is not explicitly depicted in the example, it has a moderating effect on the accuracy and

quality of patterns derived. We later empirically demonstrate the predictive power of each component via

aggregate level ablation analysis and instance-level error analysis, including how the concept and syntactic-

semantic embeddings learned contribute to the overall effectiveness of DeepPerson.

15
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

Figure 2 Visualization of Weighted Elements at Various Layers of the Attention Network

3.1 CNN-LSTM Network for Detecting Hidden Personality Traits

We utilize a Bi-LSTM network known as “embeddings from Language Models” (ELMO), which has been

successfully applied to NLP tasks (Peters et al. 2018). Each term 𝑡𝑡 of a sentence is first fed into the CNN-

𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸
based character encoder to produce the corresponding encoding 𝑥𝑥𝑡𝑡 . The encoded term sequences

are then input into the first multi-layer Bi-LSTM network that captures implicit syntactic patterns embedded

⃖����������������
in documents. Each Bi-LSTM cell produces two hidden outputs, namely ℎ
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 ����������������⃗
and ℎ
𝑆𝑆𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦𝑦
. In
𝑡𝑡.𝑙𝑙 𝑡𝑡,𝑙𝑙

⃖����������������
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 ����������������⃗
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
particular, ℎ𝑡𝑡.𝑙𝑙 represents the hidden output of term 𝑡𝑡 at the 𝑙𝑙th layer, and ℎ 𝑡𝑡,𝑙𝑙 represents the

hidden output 𝑡𝑡 for the opposite direction. Hence, the aggregated output of the multi-layer Bi-LSTM

network is as follows.

𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒𝐒 ⃖����������������
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 ����������������⃗
𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 𝑺𝑺𝑺𝑺𝒏𝒏𝒏𝒏𝒏𝒏𝒏𝒏
(1)
𝑯𝑯𝒕𝒕 = {𝒙𝒙𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬𝑬
𝒕𝒕 , 𝒉𝒉𝒕𝒕,𝒍𝒍 , 𝒉𝒉𝒕𝒕,𝒍𝒍 |𝒍𝒍 = 𝟏𝟏, . . . , 𝑳𝑳𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 } = {𝒉𝒉𝒕𝒕,𝒍𝒍 |𝒍𝒍 = 𝟎𝟎, . . . , 𝑳𝑳𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 }
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
where ℎ𝑡𝑡,0 = 𝑥𝑥𝑡𝑡 is held when 𝑙𝑙 = 0 is true, and ℎ𝑡𝑡,𝑙𝑙 represents the combination of

⃖����������������

𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 ����������������⃗
and ℎ
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
of each hidden layer. The size of the output vector of the Bi-LSTM network is 1024.
𝑡𝑡.𝑙𝑙 𝑡𝑡,𝑙𝑙

16
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

As noted, psychological concepts are an important aspect of the experiential aspect of the ideational

meta-function in the context of personality detection (Pennebaker and King 1999). Accordingly, we propose

a psychological concept embedding to enhance representational richness for personality detection. The

psychological concepts pertaining to each term are identified by using existing psycholinguistic resources

(e.g., LIWC, MRC, and SPLICE). This mapping from word/tokens to psychological concepts is a critical

mechanism for enabling domain-adapted learning that leverages human knowledge and expertise in

conjunction with robust algorithms. As shown in Figure 1, a concept embedding is produced via the
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
psychological concept encoder powered by existing psycholinguistic resources. Let 𝑥𝑥𝑡𝑡 denote the

concept embedding of a term 𝑡𝑡. The second multi-layer Bi-LSTM network is designed to capture the

sequential relationships among concepts expressed in a document, with the output denoted as:

𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 ⃖�����������������


𝑪𝑪𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐𝒐 �����������������⃗
𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪
𝑯𝑯𝒕𝒕 = {𝒙𝒙𝒕𝒕 , 𝒉𝒉𝒕𝒕,𝒍𝒍 , 𝒉𝒉𝒕𝒕,𝒍𝒍 |𝒍𝒍 = 𝟏𝟏, . . . , 𝑳𝑳𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄𝒄 } = {𝒉𝒉𝒕𝒕,𝒍𝒍 |𝒍𝒍 = 𝟎𝟎, . . . , 𝑳𝑳𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 } (2)

where ℎ𝑡𝑡,0 = 𝑥𝑥𝑡𝑡


𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
is held when 𝑙𝑙 = 0 is true. ℎ𝑡𝑡,𝑙𝑙
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 ⃖����������������� and
represents the combination of ℎ
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
𝑡𝑡,𝑙𝑙

�����������������⃗
𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶
of each hidden layer, and 𝐿𝐿𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 is the number of layers of the Bi-LSTM network. The output
ℎ𝑡𝑡,𝑙𝑙

𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
dimension of 𝐻𝐻𝑡𝑡 is the same as that of 𝐻𝐻𝑡𝑡 . Finally, the two Bi-LSTM networks are aggregated:

𝑯𝑯𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪
𝒕𝒕 = {𝒉𝒉𝒕𝒕,𝒍𝒍 |𝒍𝒍 = 𝟎𝟎, . . . , 𝑳𝑳𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 }, 𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 𝑳𝑳𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 = 𝑳𝑳𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺𝑺 + 𝑳𝑳𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪𝑪 (3)

3.2 The Word-Layer-Person Hierarchical Attention Network (wlp-HAN)

Although the CNN-LSTM network can generate rich syntactic and semantic representations, previous work

in social psychology has shown that individuals’ psychological states are related to their personalities

(Pennebaker and King 1999), and traces of these can appear at different granularities within text. Attention

mechanisms can help capture personality cues appearing at various linguistic levels for better representation

of such psychological state information related to the ideational meta-function of language - which can

manifest at the word, phrase, clause, sentence, and cross-sentence levels. However, existing attention

networks mainly deal with word-based or sentence-based attention (Yang et al. 2016; Gao et al. 2018; Jing

2019). Accordingly, our proposed wlpHAN employs attention at the word and layer levels, as well as a

17
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

personal embedding to capture speaker level linguistic cues associated with personality traits (which are

part of the inter-personal meta-function from an SFLT perspective). As we later demonstrate empirically,

the inclusion of layer and person-level attention enhances personality detection capabilities.

The architectural design of the proposed attention network is outlined in Figure 3. The output from each

layer of the multi-layer Bi-LSTMs in the CNN-LSTM network is input to the wlpHAN, which infers

appropriate weights for various psycholinguistic elements appearing in different granularities within

documents. Let 𝑇𝑇 denote the set of terms of a document 𝑚𝑚. For each term 𝑡𝑡 ∈ 𝑇𝑇, an annotation set 𝐻𝐻𝑡𝑡 is

generated by each Bi-LSTM network according to Equation 3, including both multilayer concept

embeddings and multilayer syntactic and semantic embeddings. Let ℎ𝑡𝑡,𝑙𝑙 be the hidden output corresponding

to term 𝑡𝑡 input into the 𝑙𝑙th layer of the attention network. Similar to the approach proposed by (Yang et al.

2016), our attention network assigns a higher weight to a layer if ℎ𝑡𝑡,𝑙𝑙 is similar to the context vector 𝑢𝑢𝑤𝑤

measured by the inner product of these vectors, whereas 𝑢𝑢𝑤𝑤 is randomly initialized. A Sigmoid function is

then applied to normalize the weights inferred by the attention network. Let 𝛼𝛼𝑡𝑡,𝑙𝑙 represent the derived

attention score for term 𝑡𝑡 at the 𝑙𝑙th layer of the attention network. The annotation 𝑟𝑟𝑡𝑡 of term 𝑡𝑡 is the

weighted sum of all hidden annotations of the set 𝐻𝐻𝑡𝑡 .

Figure 3 The Word-Layer-Person Hierarchical Attention Network (wlpHAN)

18
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

𝒆𝒆𝒆𝒆𝒆𝒆(𝒉𝒉⊤
𝒕𝒕,𝒍𝒍 𝒖𝒖𝒘𝒘 )
𝒓𝒓𝒕𝒕 = ∑𝒍𝒍 𝜶𝜶𝒕𝒕,𝒍𝒍 𝒉𝒉𝒕𝒕,𝒍𝒍 , 𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮𝑮 𝜶𝜶𝒕𝒕,𝒍𝒍 = ∑ ⊤ (4)
𝒍𝒍′ 𝒆𝒆𝒙𝒙𝒙𝒙(𝒉𝒉𝒕𝒕,𝒍𝒍′ 𝒖𝒖𝒘𝒘 )

Given the term annotation 𝑟𝑟𝑡𝑡 , a single Bi-LSTM layer is invoked to incorporate the contextual

information of a document into the word-level representation. For each term 𝑡𝑡, the corresponding hidden

output generated by the Bi-LSTM layer of the attention network is denoted ℎ𝑡𝑡 , and it is defined as the

concatenation of the hidden output ⃖��� ���⃗𝑡𝑡 of this layer.


ℎ𝑡𝑡 , and the hidden output of the opposite direction ℎ

⃖����𝒕𝒕 , ����⃗
𝒉𝒉𝒕𝒕 = {𝒉𝒉 ⃖������������(𝒓𝒓𝒕𝒕 ), ������������⃗
𝒉𝒉𝒕𝒕 } = {𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳 𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳(𝒓𝒓𝒕𝒕 )} (5)

The attention mechanism applied to the word-level is similar to that applied to the layer-level. The

word-level input ℎ𝑡𝑡 is first fed into a fully-connected layer to derive the partial document representation 𝑑𝑑𝑡𝑡

for each term 𝑡𝑡. Then, a context vector 𝑢𝑢𝑑𝑑 is constructed, and its similarity with ℎ𝑡𝑡 is measured in terms of

the inner product of the corresponding vectors. A Sigmoid function is then applied to normalize the weights

inferred by the word-level attention mechanism. Let 𝛼𝛼𝑡𝑡 denote the overall attention score for term 𝑡𝑡. The

final document representation 𝑑𝑑𝑚𝑚 is derived by summing the weight of each term-based partial document

representation 𝑑𝑑𝑡𝑡 .

𝒆𝒆𝒆𝒆𝒆𝒆(𝒅𝒅⊤ 𝒕𝒕 𝒖𝒖𝒅𝒅 )
𝒅𝒅𝒎𝒎 = ∑𝒕𝒕 𝜶𝜶𝒕𝒕 𝒅𝒅𝒕𝒕 , 𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈𝒈 𝒅𝒅𝒕𝒕 = 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕(𝑾𝑾𝒅𝒅 𝒉𝒉𝒕𝒕 + 𝒃𝒃𝒅𝒅 ) 𝒂𝒂𝒂𝒂𝒂𝒂 𝜶𝜶𝒕𝒕 = ∑ ⊤ (6)
𝒕𝒕′ 𝒆𝒆 𝒙𝒙𝒙𝒙(𝒅𝒅 𝒖𝒖 )
𝒕𝒕′ 𝒅𝒅

To account for person-level contextual factors, the document representation 𝑑𝑑𝑚𝑚 is passed into a single-

layer Bi-LSTM network that acts as a person-level encoder (Feng et al. 2019). The associated personal

embeddings are especially important since social media posts are often short and devoid of sufficient

broader text cues related to personality traits. Our person-level context-aware representation is as follows:

(𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷) ⃖������������(𝒅𝒅𝒎𝒎 ), ������������⃗


𝒅𝒅𝒎𝒎 = {𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳 𝑳𝑳𝑳𝑳𝑳𝑳𝑳𝑳(𝒅𝒅𝒎𝒎 )} (7)
(𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃)
Finally, this representation 𝑑𝑑𝑚𝑚 is fed into a Softmax layer to generate a probability distribution

against the Big-five personality categories 𝐶𝐶 = {𝐸𝐸𝐸𝐸𝐸𝐸, 𝑁𝑁𝑁𝑁𝑁𝑁, 𝐴𝐴𝐴𝐴𝐴𝐴, 𝐶𝐶𝐶𝐶𝐶𝐶, 𝑂𝑂𝑂𝑂𝑂𝑂}. Let 𝐷𝐷 denote the set of

documents composed by an individual. The probability that a document 𝑚𝑚 ∈ 𝐷𝐷 is composed by the

individual with a personality trait 𝑐𝑐 is inferred according to Equation 8. Moreover, the individual’s

personality score ��������������⃗


𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 with respect to the Big-five personality categories is estimated according to

19
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

Equation 9. To train the proposed hierarchical attention-based deep learning model, we adopt the common

cross entropy loss function (Majumder et al. 2017). Further model details appear in Appendix A.

(𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷)
𝒆𝒆𝒆𝒆𝒆𝒆(𝑾𝑾𝒎𝒎𝒎𝒎 𝒅𝒅𝒎𝒎 +𝒃𝒃𝒎𝒎 )
𝒑𝒑(𝒄𝒄|𝒎𝒎, 𝜽𝜽) = (𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷𝑷) (8)
∑𝒄𝒄∈𝑪𝑪 𝒆𝒆𝒙𝒙𝒙𝒙(𝑾𝑾𝒎𝒎𝒎𝒎 𝒅𝒅𝒎𝒎 +𝒃𝒃𝒎𝒎 )

∑𝒎𝒎∈𝑫𝑫 𝒑𝒑(𝒄𝒄|𝒎𝒎,𝜽𝜽)
∀𝒄𝒄 ∈ 𝑪𝑪: 𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑𝒑[𝒄𝒄] = |𝑫𝑫|
(9)

3.3 Self-Taught Personality Detection Fine-tuning

Effectively training supervised deep learning models usually entails use of a large number of labeled

training examples (Chen et al. 2018). While the first two components of DeepPerson are designed to provide

powerful personality detection capabilities, the paucity of available labeled data for pyschometric NLP

tasks such as personality detection can be a major impediment (Ahmad et al. 2020a; Hambrick 2007). From

an SFLT perspective (Halliday et al. 2004), learning is difficult if there isn’t enough contextual, semantic,

expression, and lexicogrammar content to sequence over (i.e., for the CNN-LSTM) and pay attention to

(e.g., for the wlpHAN). State-of-the-art NLP language models such as ULMFiT (Howard and Ruder 2018),

ELMo (Peters et al. 2018), and BERT (Devlin et al. 2019) bolster the amount of data upon which sequence

and attention weights can be learned by utilizing inductive transfer learning to pre-train deep neural

networks. While these methods work well for a breadth of NLP problems, their propensity to adapt to a

specific domain or task (e.g., psychometric NLP) is constrained by the availability of labeled training

examples necessary to fine-tune the models. To alleviate this problem, we design a novel inductive transfer

learning method named self-taught personality detection fine-tuning (SPDFiT) for generating pseudo-

labeled training examples to enhance the fine-tuning of the first two components of DeepPerson.

The basic intuition behind SPDFiT is as follows. First, it utilizes existing psycholinguistic resources to

(𝑢𝑢)
derive a good representation 𝑡𝑡⃗ for each unlabeled document 𝑑𝑑𝑚𝑚 ∈ 𝐷𝐷 (𝑢𝑢) , where 𝐷𝐷 (𝑢𝑢) is an unlabeled

domain-specific corpus. Second, it estimates the prior probability 𝑝𝑝(𝑡𝑡⃗|𝑐𝑐) based on a small number of

(𝑙𝑙)
labeled training examples 𝑑𝑑𝑛𝑛 ∈ 𝐷𝐷 (𝑙𝑙) . Third, the posterior probability 𝑝𝑝(𝑐𝑐|𝑡𝑡⃗) (i.e., a pseudo-label) is

derived using Bayes theorem. Fourth, a novel entropy-based measure 𝑠𝑠𝑚𝑚 ∈ [0,1] is applied to assess the

20
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

quality of each pseudo-labeled training example. Finally, pseudo-labeled examples are selected for model

fine-tuning with selection probabilities proportional to their quality measure 𝑠𝑠𝑚𝑚 . This measure is also used

to dynamically adjust the learning rate of the Stochastic Gradient Descent (SGD) process to ensure that the

model can incorporate the quantity (of data) and quality (of labeling) tradeoff as part of its learning.

At a high level, SPDFiT works with the CNN-LSTM and wlpHAN components within the DeepPerson

framework as follows: (1) a large unlabeled data set from a similar source NLP domain (e.g., the 1B Word

benchmark collection (Chelba et al. 2013)) is used to pre-train the CNN-LSTM network; (2) SPDFiT is

used to generate pseudo-labeled examples from a large unlabeled social media corpus (i.e., the Go et al.

(2009) Sentiment140 corpus) for initial fine-tuning of the whole model, (3) we apply a small number of

labeled training examples from the training set to further fine-tune the model. While state-of-the-art

inductive transfer methods such as ULMFiT (Howard and Ruder 2018), ELMo (Peters et al. 2018), and

BERT (Devlin et al. 2019) include steps (1) and (3) above for model pre-training and fine-tuning, these

methods do not used pseudo-labeling (step 2). Conceptually, this is a critical domain-adaption bridge

between powerful (generic) universal language modeling and task-specific contextualization using seed

manually labeled data rich in human insight. As we later demonstrate empirically, this step allows SPDFiT

to markedly outperform state-of-the-art models developed in industry and academia.

The detailed formulations are as follows. The CNN-LSTM network is first pre-trained on the 1B Word

benchmark collection (Chelba et al. 2013). CNN-LSTM generates two term-based probability distributions:

the forward distribution 𝑝𝑝(𝑤𝑤𝑡𝑡 |𝑤𝑤1 , 𝑤𝑤2 , . . . , 𝑤𝑤𝑡𝑡−1 ) and the backward distribution 𝑝𝑝(𝑤𝑤𝑡𝑡 |𝑤𝑤𝑡𝑡+1 , . . . , 𝑤𝑤|𝑇𝑇| ), where

𝑤𝑤𝑡𝑡 is a term weight. For each document, we jointly maximize the likelihood of the forward and the

backward probability distributions as follows:

|𝑻𝑻|
𝜣𝜣𝒏𝒏𝒏𝒏𝒏𝒏 = 𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂𝒂(∑𝒕𝒕=𝟏𝟏( log𝒑𝒑(𝒘𝒘𝒕𝒕 |𝒘𝒘𝟏𝟏 , 𝒘𝒘𝟐𝟐 , . . . , 𝒘𝒘𝒕𝒕−𝟏𝟏 ; 𝜣𝜣𝒐𝒐𝒐𝒐𝒐𝒐 ) + 𝒑𝒑(𝒘𝒘𝒕𝒕 |𝒘𝒘𝒕𝒕+𝟏𝟏 , . . . , 𝒘𝒘|𝑻𝑻| ; 𝜣𝜣𝒐𝒐𝒐𝒐𝒐𝒐 ))) (10)

The computational details of the SPDFiT method are shown in Algorithm 1. It first utilizes existing

psycholinguistic resources (e.g., LIWC, MRC, and SPLICE) to extract discriminative features (e.g.,

psychological features) from a large unlabeled social media data set (i.e., line 3 of Algorithm 1). Meanwhile,

21
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

the model parameters of the Gaussian distribution are approximated through Gibbs sampling (i.e., line 6 of

Algorithm 1). Then, the proposed algorithm computes the priori probability 𝑝𝑝(𝑡𝑡⃗|𝑐𝑐) according to the

estimated Gaussian distribution (i.e., line 7 of Algorithm 1). For the unlabeled social media data set, the

proposed algorithm infers the probability distribution of personality categories according to the Bayes

theorem (i.e., line 11 of Algorithm 1). In particular, each unlabeled training example is assigned the

personality category with the highest probability (i.e., pseudo-labeling) in line 12 of Algorithm 1. SPDFiT

employs Bayesian learning since it is a solid decision theoretic framework that offers an intuitive and

principled way of combining prior evidence (e.g., psycholinguistic patterns) to infer the most probable

outcomes (pseudo-labels) (Haussler et al. 1994), and has been used effectively in prior deep learning

contexts involving limited labeled data (Gal et al. 2017). As we demonstrate empirically in our ensuing

evaluation, it outperforms other learning approaches such as logistic regression-based pseudo-labelling.

Algorithm 1 - Self-taught Personality Detection Fine-tuning: SPDFiT


Input: A labeled training set 𝐷𝐷 (𝑙𝑙) with 𝑁𝑁 documents and 𝐿𝐿 features, a large unlabeled training set 𝐷𝐷 (𝑢𝑢) with 𝑀𝑀
documents and 𝐿𝐿 features, a set of psycholinguistic resources 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿, a set of personality categories 𝐶𝐶 , the
learning rate 𝑟𝑟 for SGD
Output: DeepPerson with initially fine-tuned parameters 𝜃𝜃
1. Let t⃗ = < 𝑡𝑡1 , 𝑡𝑡2 , … , 𝑡𝑡𝐿𝐿 > , where 𝑡𝑡𝑖𝑖 is the 𝑖𝑖 feature of the feature vector t⃗
(𝑙𝑙)
2. FOR each labeled document 𝑑𝑑𝑛𝑛 ∈ 𝐷𝐷 (𝑙𝑙) DO
(𝑙𝑙) (𝑙𝑙)
3. Extract features of 𝑑𝑑𝑛𝑛 using psycholinguistic resources: ⃗t = extract(𝑑𝑑𝑛𝑛 , 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿), where ⃗t ∈ ℝ𝐿𝐿
4. END FOR
5. FOR each personality category 𝑐𝑐 ∈ 𝐶𝐶 DO
6. Estimate parameters (𝜇𝜇𝑐𝑐 , 𝛴𝛴𝑐𝑐 ) of the Gaussian distribution 𝒩𝒩(𝜇𝜇𝑐𝑐 , 𝛴𝛴𝑐𝑐 ) by Gibbs Sampling
7. Compute the prior probability 𝑝𝑝(𝑡𝑡⃗|𝑐𝑐) ∼ 𝒩𝒩(𝜇𝜇𝑐𝑐 , 𝛴𝛴𝑐𝑐 ), where 𝜇𝜇𝑐𝑐 ∈ ℝ, 𝛴𝛴𝑐𝑐 ∈ ℝ𝐿𝐿∗𝐿𝐿
8. END FOR
(𝑢𝑢)
9. FOR each unlabeled document 𝑑𝑑𝑚𝑚 ∈ 𝐷𝐷 (𝑢𝑢) DO
(𝑢𝑢) (𝑢𝑢)
10. Extract features of 𝑑𝑑𝑚𝑚 using psycholinguistic resources: 𝑡𝑡⃗ = extract(𝑑𝑑𝑚𝑚 , 𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿𝐿), where 𝑡𝑡⃗ ∈ ℝ𝐿𝐿
𝑝𝑝(𝑡𝑡⃗|𝑐𝑐,𝜇𝜇𝑐𝑐 ,𝛴𝛴𝑐𝑐 )𝑝𝑝(𝑐𝑐)
11. Compute posterior probabilities: ∀𝑐𝑐 ∈ 𝐶𝐶: 𝑝𝑝(𝑐𝑐|𝑡𝑡⃗) = ∑
𝑐𝑐′∈𝐶𝐶 𝑝𝑝(𝑡𝑡⃗|𝑐𝑐′,𝜇𝜇𝑐𝑐′ ,𝛴𝛴𝑐𝑐′ )𝑝𝑝(𝑐𝑐′)
12. Set pseudo label 𝑙𝑙𝑚𝑚 = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑐𝑐 𝑝𝑝(𝑐𝑐|𝑡𝑡⃗ )
𝐻𝐻(𝜑𝜑𝑚𝑚𝑚𝑚𝑚𝑚 )−𝐻𝐻(𝜑𝜑𝑚𝑚 )
13. Compute Entropy-based quality score: 𝑠𝑠𝑚𝑚 =
𝐻𝐻(𝜑𝜑𝑚𝑚𝑚𝑚𝑚𝑚 )
(𝑢𝑢)
14. Stochastic selection of pseudo-labeled training instance (𝑑𝑑𝑚𝑚 , 𝑙𝑙𝑚𝑚 ) based on 𝑠𝑠𝑚𝑚
(𝑢𝑢)
15. IF 𝑑𝑑𝑚𝑚 is selected THEN
(𝑢𝑢)
16. Predict personality label 𝑝𝑝(𝑐𝑐|𝜃𝜃, 𝑑𝑑𝑚𝑚 ) by invoking the DeepPerson framework
(𝑢𝑢)
17. Compute the gradient: 𝑔𝑔 =▽𝜃𝜃 ℒ(𝑝𝑝(𝑐𝑐|𝜃𝜃, 𝑑𝑑𝑚𝑚 ), 𝑙𝑙𝑚𝑚 )
18. Update parameter: 𝜃𝜃 = 𝜃𝜃 − 𝑟𝑟 ∗ 𝑠𝑠𝑚𝑚 ∗ 𝑔𝑔
19. END IF
20. END FOR

22
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

Quality is always an important consideration with semi-supervised and unsupervised approaches such

as pseudo-labeling (Lee 2013). Based on the maximum likelihood assumption, pseudo-labeled training

examples with relatively large probabilities with respect to a certain class are more likely to be assigned the

correct class labels. Accordingly, we use an information theoretic metric (𝑠𝑠𝑚𝑚 ∈ [0,1]) to estimate the quality

of pseudo-labeled training examples (i.e., line 13 of Algorithm 1). In information theory, “entropy” denoted

|𝑆𝑆|
𝐻𝐻(𝑆𝑆) = ∑𝑖𝑖=1 − 𝑝𝑝𝑖𝑖 log2 𝑝𝑝𝑖𝑖 has been widely used to measure the uncertainty of a system 𝑆𝑆 , where a

probability distribution 𝜑𝜑 is often used to characterize various states 𝑖𝑖 of the system 𝑆𝑆. Given the class

distributions of pseudo-labeled training examples (i.e., 𝜑𝜑𝑚𝑚 ), the instances with relatively low entropy (i.e.,

low uncertainty or high quality) are more likely to be selected for fine-tuning the proposed deep learning

model. Let 𝜑𝜑𝑚𝑚𝑚𝑚𝑚𝑚 denote the most uncertain pseudo-labeling (i.e., an even probability distribution) of any

unlabeled examples and 𝜑𝜑𝑚𝑚 denote the probability distribution of pseudo-labeling for an arbitrary

unlabeled example 𝑚𝑚. The proposed information theoretic metric for estimating the certainty (quality) of

𝐻𝐻(𝜑𝜑𝑚𝑚𝑚𝑚𝑚𝑚 )−𝐻𝐻(𝜑𝜑𝑚𝑚 )
pseudo-labeled training examples is defined as follows: 𝑠𝑠𝑚𝑚 = 𝐻𝐻(𝜑𝜑𝑚𝑚𝑚𝑚𝑚𝑚 )
. Further, this quality metric

is also used to control the learning rate of the SGD process during model fine-tuning (i.e., lines 17-18 of

Algorithm 1). Hence, the pseudo-labeled training examples with relatively high certainty scores will trigger

higher learning rates in the SGD process, and thereby exert greater influence during model fine-tuning.

4 Design Evaluation

Following the design science approach, we evaluate the operational utility of our proposed artifact in two

ways (Gregor and Hevner 2013). First, we use a design evaluation to show that the DeepPerson framework,

grounded in SFLT, outperforms existing feature and deep learning methods for text-based detection of

personality dimensions. As part of this evaluation, we also show that this performance lift is attributable to

the effectiveness of its key components, namely, wlpHAN and SPDFiT. Our second evaluation uses

empirical case studies to demonstrate the downstream implications of these performance deltas. We show

that text personality variables developed using DeepPerson can significantly improve forecasting in

23
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

financial and health contexts where executive decision-making can shape outcomes. Our design evaluation

is discussed in the remainder of this section (Section 4) while one of the case studies appears in Section 5.

4.1 Data Sets and Evaluation Procedures

To evaluate the design of DeepPerson, we used three well-known benchmark collections, namely

PANDORA (Gjurković et al. 2021), myPersonality (Celli et al. 2013) and the Essays data set (Mairesse et

al. 2007). PANDORA is a large-scale collection of 3,000,566 Reddit comments from 1,568 users and their

corresponding personality traits elicited using surveys involving the same Big-five constructs (Goldberg

1990). The myPersonality data set contains 10,000 status updates contributed by 250 Facebook users (Celli

et al. 2013), and their accompanying Big Five personality survey results. In contrast, the Essays corpus

contains 2,479 essays that capture a total of 1.9 million words composed by 2,479 psychology students

(Mairesse et al. 2007). Similarly, students’ personality traits were elicited by using questionnaires that

incorporated the Big-five constructs. Table 3 depicts basic descriptive statistics for each of the data sets.

PANDORA (Reddit) myPersonality (Facebook) Essays


1,568 users, 3,000,068 posts 250 users, 9,917 updates 2,479 users, 2,479 documents
Mean STD Min-Max Mean STD Min-Max Mean STD Min-Max
# Posts per User 1917.5 4242.7 1-52406 39.7 43.6 1-223 1.0 0 1-1
EXT 0.37 0.30 0-1 3.29 0.86 1.33-5.00 0.52 0.50 0-1
NEU 0.50 0.32 0-1 2.63 0.78 1.25-4.75 0.50 0.50 0-1
AGR 0.42 0.31 0-1 3.60 0.67 1.65-5.00 0.53 0.50 0-1
CON 0.40 0.30 0-1 3.52 0.74 1.45-5.00 0.51 0.50 0-1
OPN 0.63 0.28 0-1 4.07 0.58 2.25-5.00 0.52 0.50 0-1
# Words per Post 0.39 70.2 1-5306 14.74 12.76 1-113 663.1 267.5 34-3836
# Noun per Post 0.65 12.1 0-334 2.81 2.68 0-37 80.66 34.5 5-294
# Verb per Post 0.24 4.2 0-64 0.81 1.24 0-12 41.10 19.1 1-178
# Adj per Post 0.32 6.2 0-168 1.02 1.35 0-36 37.78 16.8 2-165
# Adv per Post 0.34 5.6 0-78 0.99 1.39 0-15 63.66 29.2 3-290
# Concept per Post 0.17 10.1 0-52 11.62 6.80 0-39 45.74 3.4 22-51
Table 3 Basic Descriptive Statistics of the Three Adopted Data Sets
In our main evaluation, we compared DeepPerson against feature-based and deep learning methods

used in prior personality detection studies, as well as state of the art universal language models (all

previously discussed in Table 1). Feature-based methods included KNN coupled with LIWC categories

(Farnadi et al. 2013), SVM using word n-grams (Wright and Chin 2014), gradient boosted trees (Tadesse

et al. 2018), and the synthetic minority over-sampling and Tomek Link (SMOTETomek) personality

detector (Wang et al. 2019a). As noted in our discussion of related work, such LIWC and n-gram-based

24
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

features input into classical machine learning methods have been used extensively for personality detection

(Iacobelli et al. 2011). Our deep learning-based benchmark personality detectors included CNN-1

(Majumder et al. 2017), CNN-2 (Yu and Markov 2017), Gated Recurrent Unit (GRU) network (Yu and

Markov 2017), AttRCNN (Xue et al. 2018), LSTM+CNN (Sun et al. 2018), and the graph convolutional

networks GCN (Wang et al. 2020). We also included IBM Personality Insights (Liu et al. 2016),

Personality2Vec (Guan et al. 2020), and the well-known BERT neural language model developed at Google

(Devlin et al. 2019), which has outperformed other methods for many NLP tasks. BERT-Base was simply

fine-tuned on our training data sets (no further pre-training). Conversely, PersonaBERT further pre-trained

the BERT-Base model from checkpoints using the same Sentiment140 and 1BWord corpora used by

DeepPerson, before fine-tuning on our training data sets. BERT+NN used the BERT-Base transformer

embeddings as input for a multi-layer neural network (Leonardi et al. 2020).

Consistent with previous studies (Farnadi et al. 2013; Alam et al. 2013; Majumder et al. 2017; Yu and

Markov 2017; Wang et al. 2019a), the personality label of a post/document was considered to be a binarized

(median split) representation of the survey-based gold-standard personality label of the user who

contributed the post/document – hence, personality detection was considered a binary classification

problem. The class label 𝑐𝑐 ∈ {0,1} was assumed for each of the Big Five dimensions, and in each run, a

personality detector classified whether a document contained that particular personality dimension.

Following the common evaluation process for machine learning models involving user-centric data

(Prechelt 1998; Ahmad et al. 2020a), our data set was divided into a training set (50% of users), a validation

set (25% of users), and a test set (25% of users). Training was performed on all documents associated with

users in the training set, parameter tuning occurred on the validation users’ documents, models were

evaluated on the test users’ documents. In order to make the evaluation more robust, a repeated random

sub-sampling validation process was invoked where the training-validation-testing user splits were

randomly shuffled ten times. For design evaluation, standard document classification metrics such as

precision, recall, F-score, accuracy, AUC were macro-averaged across the Big-five personality categories

(Alam et al. 2013). We also report performance on each of the five dimensions, separately. Moreover, we

25
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

adopted a non-parametric statistical test, namely the Wilcoxon signed-rank test (Wilcoxon 1992) to evaluate

the statistical significance of the different performance scores achieved by various models. DeepPerson was

implemented on the ELMo architecture in Pytorch. Consistent with prior studies, a grid search was used to

tune parameters on the validation set. A mini-batch size of 500 and dropout rate of 0.5 were used.

4.2 Comparing DeepPerson to Benchmark NLP Methods

In this section we describe the overall design evaluation results for DeepPerson relative to the

aforementioned feature-based, deep learning, and language modeling methods. We present results for the

PANDORA and myPersonality data sets related to personality traces appearing in social media posts

(Tables 4 and 5). The results on the essay data can be found in Appendix B. The first two columns in Tables

4 and 5 depict the category of method and specific method name. The next five columns show F-scores for

individual Big-five dimensions, whereas the last six columns display macro-averaged f-score, precision,

recall, accuracy, AUC, and percentage improvement in AUC.

Paradigm Method EXT NEU CON AGR OPN Av. F Av. P Av. R Acc AUC Imp.
Transfer
DeepPerson 64.9 64.3 63.8 66.5 66.1 65.1 67.8 62.7 69.9 75.0 +33.7%
Learning
CNN-1 58.6 58.7 57.8 59.9 60.5 59.1 60.6 57.7 65.2 64.0 +14.1%
CNN-2 57.7 57.9 57.1 59.6 58.4 58.1 59.5 56.8 64.4 62.7 +11.8%
Represent. AttRCNN 59.2 59.0 57.0 61.9 60.5 59.5 61.0 58.2 65.6 64.6 +15.2%
Learning Msg-Attn 56.2 56.8 55.6 58.3 57.3 56.9 57.9 55.9 63.4 60.5 +7.8%
GRU 56.3 54.1 53.7 58.3 55.1 55.5 56.2 54.9 61.8 57.8 +3.0%
LSTM+CNN 57.0 57.2 56.5 59.1 58.0 57.5 58.8 56.3 64.0 61.2 +9.1%
GCN 56.7 56.3 56.2 58.4 56.8 56.9 58.0 55.8 63.4 60.5 +7.8%
PersonaBERT 58.2 58.3 57.5 60.1 59.5 58.7 60.2 57.4 65.0 63.4 +13.0%
Language BERT-Base 55.2 56.7 56.7 58.8 57.9 57.1 58.3 55.9 63.7 61.5 +9.6%
Model BERT+NN 55.5 56.9 57.0 58.3 58.2 57.2 58.5 56.0 63.8 60.6 +8.0%
RoBERTa 58.4 58.0 57.9 59.7 60.1 58.8 60.3 57.5 65.0 63.2 +12.7%
IBM 55.2 53.0 52.9 49.7 47.6 53.4 52.8 54.1 57.5 56.1 -
KNN 56.3 55.6 53.9 57.6 57.3 56.1 57.1 55.2 63.0 58.6 +4.5%
Feature- SVM 56.2 55.7 54.9 56.6 51.9 55.1 56.0 54.2 61.7 56.9 +1.4%
based XGBoost 56.2 56.7 54.2 57.6 56.2 56.2 57.2 55.3 62.9 58.9 +5.0%
Personality2Vec 58.3 58.2 58.0 60.2 58.4 58.6 60.0 57.3 64.8 62.9 +12.1%
SMOTETomek 57.4 56.8 55.7 57.4 53.3 56.1 57.2 55.1 62.5 59.4 +5.9%
Notes. “Av. F”, “Av. P”, “Av. R”, “Acc”, “AUC” refer to macro-averaged F-score, precision, recall, accuracy, and area under
the ROC curve w.r.t five personality categories. “Imp.” refers to percentage improvement in terms of AUC. All numbers are shown
in % format. CNN-1 (Majumder et al. 2017), CNN-2 (Yu and Markov 2017), GRU (Yu and Markov 2017), BERT-Base (Devlin et
al. 2019), BERT+NN (Leonardi et al. 2020), RoBERTa (Liu et al. 2019), KNN (Farnadi et al. 2013), SVM (Wright and Chin 2014),
XGBoost (Tadesse et al. 2018), AttRCNN (Xue et al. 2018), Msg-Attn (Lynn et al. 2020), GCN (Wang et al. 2020), Personality2Vec
(Guan et al. 2020), SMOTETomek (Wang et al. 2019a), LSTM+CNN (Sun et al. 2018).

Table 4 Evaluation of DeepPerson and Comparison Methods on PANDORA (Reddit)

26
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

Paradigm Method EXT NEU CON AGR OPN Av. F Av. P Av. R Acc AUC Imp.
Transfer
DeepPerson 67.4 66.8 66.6 66.3 67.7 67.0 69.5 64.8 70.3 70.7 +25.1%
Learning
CNN-1 58.2 57.0 58.0 54.3 59.5 57.4 58.8 56.1 62.2 62.7 +11.0%
CNN-2 57.0 55.7 56.0 53.3 58.1 56.0 57.4 54.8 61.0 60.2 +6.5%
Represent. AttRCNN 58.7 58.5 57.7 60.3 59.8 59.0 60.5 57.6 64.2 63.3 +12.0%
Learning Msg-Attn 56.0 56.9 56.1 53.5 57.2 55.9 56.8 55.2 51.1 61.2 +8.3%
GRU 55.1 50.3 53.6 51.2 57.2 53.5 54.2 52.8 59.1 59.0 +4.4%
LSTM+CNN 56.0 55.9 55.9 53.3 60.3 56.3 56.6 56.0 54.3 62.5 +10.6%
GCN 57.1 56.1 54.7 54.9 60.0 56.5 56.4 56.6 58.8 61.2 +8.3%
PersonaBERT 59.5 53.2 57.0 55.1 60.6 57.1 58.3 56.0 62.1 61.2 +8.3%
Language BERT-Base 57.2 53.9 56.0 53.2 60.6 56.2 58.1 54.4 61.6 60.6 +7.3%
Model BERT+NN 57.6 53.9 56.1 53.1 60.5 56.2 59.0 54.1 61.0 60.1 +6.4%
RoBERTa 58.7 55.2 56.7 56.3 59.9 57.4 59.4 55.5 62.5 62.7 +11.0%
IBM 56.2 51.0 52.2 45.5 42.3 49.4 50.1 48.8 53.0 56.5 -
KNN 56.5 58.3 54.1 54.2 58.3 56.3 57.8 55.0 54.4 62.0 +9.7%
Feature- SVM 54.5 54.8 51.5 52.2 60.6 54.7 54.7 54.8 51.6 61.0 +8.0%
based XGBoost 57.3 57.0 54.3 55.1 56.2 56.0 57.2 55.0 55.3 60.9 +7.8%
Personality2Vec 57.8 57.0 59.0 55.0 58.4 57.5 58.0 56. 9 57.8 61.7 +9.2%
SMOTETomek 54.7 55.2 53.4 52.3 60.1 55.8 56.2 54.2 47.5 61.0 +8.0%

Table 5 Evaluation of DeepPerson and Comparison Methods on myPersonality (Facebook)


The results appearing in Tables 4 and 5 reveal that DeepPerson significantly outperforms all

comparison methods in terms of AUC, macro F-score, precision, recall, and accuracy. These performance

deltas are consistent across individual personality dimensions. DeepPerson outperforms the best

comparison methods, namely AttRCNN (Xue et al. 2018), CNN-1 (Majumder et al. 2017) and

PersonaBERT, by 5 to 15 percentage points across all measures. Using IBM Personality Insights (i.e., the

weakest comparison method) as a reference point for percentage lift in AUC, DeepPerson is +25% to

+33% higher on the two data sets. This is nearly 13% to 20% relative percentage points higher than the

best comparison methods, respectively. The Wilcoxon signed-rank tests reveal that DeepPerson’s gains are

significant. For instance, compared to CNN-1 (𝑊𝑊 = 0, 𝑝𝑝 < .01) for EXT, NEU, CON, AGR, and OPN.

While not depicted here, the results on the Essay data are comparable - DeepPerson significantly

outperforms all comparison methods (see Appendix B of the Online Supplement). Finally, since our

ultimate goal for downstream tasks is to try to approximate a user’s personality dimensions (averaged over

all document-level scores), we also report results for user-level approximation on PANDORA and

myPersonality in Appendix B (Tables B3/B4). DeepPerson attains Pearson’s correlation values that are at

least 10-18 points higher than the best comparison method, and MSE values that are also at least 10% lower.

The results seem to support the efficacy of middle-ground frameworks that harness rich domain knowledge

27
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

and context-relevant NLP theory in conjunction with powerful state of the art machine learning approaches.

In the ensuing section, we use ablation analysis to show that the performance of DeepPerson is attributable

to its key components that support the SFLT-based design guidelines: CNN-LSTM, wlpHAN, and SPDFiT.

4.3 Ablation Analysis of SPDFiT

Two key components of DeepPerson are the wlpHAN attention network and the pseudo-labeling SPDFiT

transfer learning method. In order to evaluate their additive impact on DeepPerson, we ran experiments

where wlpHAN was removed and SPDFiT was replaced with other baseline methods. The results on the

PANDORA data are presented in Table 6 – the myPersonality results can be found in Appendix B (Table

B1). DeepPerson devoid of wlpHAN appears as the first setting: CNN-LSTM (SPDFiT). The absence of

wlpHAN does reduce AUC by about 5 percentage points (relative to the first row in Table 5), underscoring

the importance of wlpHAN. The second and third settings depict DeepPerson with wlpHAN and SPDFiT

removed. In these settings, the CNN-LSTMs were pre-trained using the 1B Word benchmark collection

(Chelba et al. 2013) before fine-tuning with the PANDORA training data, and in the case of row two (i.e.,

1BWord+Sentiment140), further pre-trained with the Sentiment140 corpus (Go et al. 2009). More details

of the experiments are reported in Appendix I of the online supplement. We also report the basic descriptive

statistics of the 1B Word and Sentiment140 corpora in Table I1 (Appendix I).

In settings 4-5, SPDFiT was replaced with other state-of-the-art transfer learning methods: UDA (Xie

et al. 2020) and Self-Ensembling (Laine and Aila 2016). We implemented UDA and Self-Ensembling using

an open-source back-translation tool for data augmentation (Edunov et al. 2018). UDA used a loss function

based on KL divergence while Self-Ensembling employed mean square error as the loss function. Since

UDA and Self-Ensembling are not specifically designed for personality detection tasks, to have a fair

comparison, they employed the same exact psychological lexicons as SPDFiT (i.e., the LIWC, MRC, and

SPLICE). Settings 6-8 depict alternative pseudo-labeling methods that utilize logistic regression (Lee’s

2013), Lasso regression (Hastie et al. 2009), or Ridge regression for pseudo-labeling. Unlike SPDFiT, these

pseudo-labeling methods are not equipped with a quality assessment metric to filter out low-quality labels.

We also included three BERT (Devlin et al. 2019) settings, the aforementioned BERT-Base and

28
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

PersonaBERT, plus an intermediate setting only further pre-trained on 1BWord (but not Sentiment140)

before being fine-tuned on PANDORA training data (settings 9-11). In setting 12, we replaced CNN-LSTM

with just a Bi-LSTM. Finally, setting 13 used a Doc2vec (Le and Mikolov 2014) – like BERT-Base, this

setting too signified the impact of no domain-specific pre-training. The BERT, Doc2Vec, and Bi-LSTM

settings did not utilize character-level embeddings.

Method EXT NEU CON AGR OPN Av. F Av. P Av. R Acc AUC Imp.
1. CNN-LSTM (SPDFiT) 62.1 61.8 61.4 63.7 62.7 62.3 64.5 60.4 67.7 70.0 20.1%
2. CNN-LSTM
58.1 58.7 57.6 61.0 59.5 59.0 60.4 57.6 65.1 63.6 9.1%
(1BWord+Sentiment140)
3. CNN-LSTM (1BWord) 57.7 57.9 57.0 59.7 58.9 58.2 59.6 57.0 64.6 62.3 6.9%
4. CNN-LSTM (UDA) 59.2 59.2 58.5 61.2 61.0 59.8 61.5 58.3 65.8 65.1 11.7%
5. CNN-LSTM (Self-Ensembling) 59.4 59.1 58.2 61.0 60.8 59.7 61.3 58.2 65.8 65.2 11.8%
6. CNN-LSTM (Logistic) 58.9 59.1 58.1 61.6 60.4 59.6 61.2 58.1 65.6 64.9 11.3%
7. CNN-LSTM (LASSO) 58.8 59.0 58.0 61.5 60.1 59.5 61.0 58.0 65.5 64.7 11.0%
8. CNN-LSTM (Ridge) 58.8 59.1 58.1 61.4 60.3 59.5 61.1 58.1 65.6 64.7 11.0%
9. PersonaBERT 58.2 58.3 57.5 60.1 59.5 58.7 60.2 57.4 65.0 63.4 8.7%
10. BERT (1BWord) 56.7 56.7 55.8 61.0 58.9 57.8 59.0 56.8 64.2 61.3 5.1%
11. BERT (Base) 55.2 56.7 56.7 58.8 57.9 57.1 58.3 55.9 63.7 60.2 3.3%
12. Bi-LSTM (1BWord) 56.0 55.9 56.6 57.3 57.1 56.6 57.7 55.5 63.2 59.1 1.4%
13. Doc2Vec (Pretrained) 54.6 55.0 55.9 56.6 57.2 55.8 56.8 54.9 62.5 58.3 -
Notes. “Av. F”, “Av. P”, “Av. R”, “Acc”, and “AUC” refer to macro-averaged F-score, precision, recall, and accuracy w.r.t five
personality categories. “Imp.” refers to improvement in terms of AUC. All numbers are shown in % format.

Table 6 Comparative Evaluation of SPDFiT and its Variants (without wlpHAN)

The improvement column in Table 6 shows that DeepPerson devoid of wlpHAN improves AUC by

20% (F-score by +11.7%) compared with the Doc2Vec (pretrained) approach, and is at least +8% better

than all ablation settings in terms of relative percentage improvement. The exclusion of SPDFiT after

wlpHAN has already been removed (settings 2-8) degrades performance by 5-7 points in terms of AUC

(relative improvement of at least +8%). This includes alternative pseudo-labeling methods such as CNN-

LSTM(Logistic), CNN-LSTM(Lasso), and CNN-LSTM(Ridge) and state-of-the-art transfer learning

methods like UDA and Self-Ensembling. Although not depicted, with SPDFiT and wlpHAN, this relative

delta is about +28%. SPDFiT (setting 1) also outperforms all BERT models (settings 9-11), including when

further pre-trained on the same domain-specific corpora (and fine-tuned on personality training data), by at

least 11% in terms of relative percentage improvement. Finally, CNN-LSTM (setting 2) outperforms the

use of Bi-LSTM (setting 12), suggesting that even without wlpHAN and SPDFiT, the CNN-LSTM setting

still works well. Collectively, the results of this first ablation analysis underscore the importance of all three

29
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

key components of DeepPerson, and SPDFiT in particular. Wilcoxon signed-rank tests show that these

deltas are significant (all p-values < 0.01).

Figure 4 Impact of Proportion of Unlabeled Data on Performance for SPDFiT

An important consideration for transfer learning approaches is the amount of unlabeled data needed to

garner enhanced predictive power. We performed additional analysis to examine the impact of the

proportion of pseudo-labeled data on the performance of SPDFiT. We varied the percentage of unlabeled

training examples from 10% to 100% (i.e., 100% denotes the full unlabeled data set), in increments of 10%.

In order to isolate the impact of just using unlabeled data, for all methods evaluated, no fine-tuning was

performed on labeled training data. Hence, unlabeled data was used to train the models, which were then

evaluated on the PANDORA and myPersonality test data across the various folds. For each increment,

DeepPerson and comparison methods were trained for 20 epochs. The top two charts in Figure 4 depict

plots of the classification performance when using SPDFiT versus comparison transfer-learning

alternatives. The results reveal that SPDFiT is able to garner fairly good results when using as little as 50%

of the full unlabeled training set – moreover, it outperforms all comparison methods in terms of overall F-

score when using 40% or more of the unlabeled data on PANDORA or 30% or more of the data on

30
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

myPersonality. The bottom two charts depict the performance of SPDFiT on the five individual personality

dimensions. Though not shown here, SPDFiT outperformed all comparison methods on all five dimensions

when using just 50% of the unlabeled data. Given the wide range over which SPDFiT works well, we

believe the results further underscore the robustness of the SPDFiT component of DeepPerson.

4.4 Ablation Analysis of wlpHAN

For the second ablation analysis, we examined the effectiveness of the word-, layer-, and person-based

components of wlpHAN (depicted in Figure 3). For all settings, DeepPerson was invoked without the

SPDFiT module to better isolate the performance impact of wlpHAN. In particular, we compared the

detection performance of CNN-LSTM with full wlpHAN (setting 1) against a word-based attention only

(i.e., no layer or person-level attention, setting 4), one with synsem+word (no concept embedding in the

layer level attention – setting 3), and one with synsem+concept+word but no person-level encoder (setting

2). As noted in our related work section, incorporating psychological concepts into our deep learning model

might be construed as being somewhat analogous to aspects-level sentiment classification (Cheng et al.

2017; Wang et al. 2019b, Li et al. 2019, Galassi et al 2020). Accordingly, in settings 5-7, in place of

wlpHAN we substituted three aspect attention methods based on the notion of aspect-aware functions (Zhou

et al. 2019): Dot-Product Attention (DPA), Concat Attention (CA) and General Attention (GA). In settings

8-12, we swapped out wlpHAN for other state-of-the-art attention networks such as HAN (Yang et al.

2016), SATT-LSTM (Jing 2019), HCAN (Gao et al. 2018), AttRCNN (Xue et al. 2018), and Message-level

Attention (Msg-Attn) (Lynn et al. 2020).

As shown in Table 7 (settings 2-4), the syntax/semantic layer, concept, and person level encoders each

contribute about 2 percentage points to wlpHAN’s overall AUC. wlpHAN also outperforms other state-of-

the-art attention networks depicted in settings 8-12 such as HAN, SATT-LSTM, AttRCNN, Msg-Attn, and

HCAN by 3 to 6 percentage points. Further, when replacing wlpHAN with aspect-level attention networks

(i.e., settings 5-7 in Table 7), performance degrades by 5 to 6 percentage points. The relative percentage

improvements for wlpHAN compared to all existing attention models is 5% to 11%, with all differences

significant (p-values < 0.01). This performance improvement can be attributed to wlpHAN’s capability to

31
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

incorporate syntax, psychologic concepts, and person-level contextual information into the personality

detection process – these are all elements shown to be important for personality detection and are well-

aligned with our SFLT-based design guidelines (Gill and Oberlander 2003; Mairesse et al. 2007).

Method EXT NEU CON AGR OPN Av. F Av. P Av. R Acc AUC Imp.
1. CNN-LSTM (wlpHAN) 62.4 61.5 61.6 64.5 64.0 62.8 64.9 60.9 68.2 70.3 +20.0%
2. CNN-LSTM
61.1 60.6 60.0 62.7 62.6 61.4 63.3 59.7 67.1 68.0 +16.0%
(SynSem+Concept+Word)
3. CNN-LSTM (SynSem+Word) 60.3 59.7 59.4 61.5 61.6 60.5 62.1 59.0 66.3 66.2 +13.0%
4. CNN-LSTM (Word) 58.3 58.6 57.8 60.6 60.0 59.0 60.4 57.8 65.1 64.0 +9.2%
5. Aspect-Attention (DPA) 57.7 57.1 62.3 58.4 62.6 59.6 61.0 58.3 65.5 64.7 +10.4%
6. Aspect-Attention (CA) 56.8 57.5 60.3 58.7 61.6 59.0 60.3 57.7 65.1 63.9 +9.0%
7. Aspect-Attention (GA) 57.2 58.6 61.7 58.7 61.5 59.5 61.1 58.1 65.4 64.8 +10.6%
8. HAN 56.2 57.5 55.7 58.3 57.3 57.0 58.1 56.0 63.5 60.7 +3.6%
9. SATT-LSTM 55.2 55.9 54.8 57.5 56.3 55.9 56.8 55.1 62.7 58.6 -
10. HCAN 56.2 57.1 56.1 58.8 57.9 57.2 58.4 56.2 63.7 61.1 +4.3%
11. AttRCNN 59.2 59.0 57.0 61.9 60.5 59.5 61.0 58.2 65.6 64.6 +10.2%
12. Msg-Attn 56.2 56.8 55.6 58.3 57.3 56.9 57.9 55.9 63.4 60.5 +3.2%
Notes. “Av. F”, “Av. P”, “Av. R”, “Acc”, “AUC” refer to macro-averaged F-score, precision, recall, accuracy, AUC w.r.t five
personality categories. “Imp.” refers to improvement in terms of AUC. All numbers are shown in % format.

Table 7 Comparative Evaluation of wlpHAN

4.5 Error Analysis of DeepPerson Versus Benchmark Methods

As noted in Figure 2 and related discussion, and shown empirically with ablation results presented in Tables

6 and 7, the psychological concepts and patterns derived using CNN-LSTM coupled with wlpHAN (with

performance boosted by SPDFiT) are critical to the performance of DeepPerson relative to the state-of-the-

art. To delve deeper into these results, we conducted a series of pair-wise comparisons of instance-level

error rates for DeepPerson versus CNN-1, CNN-2, PersonaBERT, and AttRCNN. In each comparison, we

identified the 25% of instances on PANDORA with the widest prediction error margins between

DeepPerson and each comparison method (i.e., the cases where DeepPerson was most accurate relative to

the comparison method in terms of MSE or MAE). For these instances, we then used the following additive

ablation settings to identify how various components of DeepPerson contributed to these deltas: CNN-

LSTM (word), CNN-LSTM (SynSem+Word), CNN-LSTM (SynSem+Concept+Word), CNN-LSTM

(+wlpHAN), and CNN-LSTM (+SPFFiT) which is the full DeepPerson. Further, this analysis was

performed within each of the Big Five traits (i.e., for all five DVs) to allow better understanding of how

learned patterns/components improve identification of different personality traits. The results for MSE

appear in Figure 5. Note that the y-axis shows relative improvements compared to the previous component.

32
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

Figure 5 Relative MSE Improvement by Adding Components of DeepPerson

Looking at the bar charts, we can see that just using CNN-LSTMs with the word representation

underperforms AttRCNN on all five dimensions (even on these instances where overall lifts are highest for

DeepPerson). Similarly, lifts versus CNN-1, CNN-2, and PersonaBERT are also modest on these instances

where DeepPerson as a whole is most dominant. Interestingly, adding synsem and concept patterns, the

personal embeddings in wlpHAN, and SPDFiT all cause large incremental improvements. It is worth noting

33
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

that the synsem and concept embeddings complement each other. While both have sizable lifts for all five

traits, the former is most effective on the conscientiousness and extraversion traits (green and red bars) and

the latter on agreeableness and openness (blue and orange). The results also show that the personal

embedding lift is most pronounced compared to PersonaBERT, and we see the SPDFiT moderating “boost”

across all five traits, in all four comparisons. By comparing results on instances most likely driving relative

deltas for DeepPerson against four of the best benchmarks, on all five traits, the results underscore how

DeepPerson uses representational richness via its three main components to better infer personality digital

traces and reduce error rates.

Concept Patterns Example Concept Text


Trait: EXT; Scores: Actual = 0.85, DeepPerson = 0.90, AttRCNN = 0.46
[posemo posemo] Life is much better <posemo> with people to share <posemo> it with!
[posemo friend] It's nice <posemo> having a partner <friend> to wrestle life with.
[posemo social I'm just glad <posemo> you're <social> readily about to lend a helping <social> hand when asked
social]
Trait: NEU; Scores: Actual = 0.88, DeepPerson = 0.91, AttRCNN = 0.51
[affect anx] I had put 6 hours into the game and never enjoyed <affect> much of it, 30 FPS was very distracting
<anx>.
[present negemo anx] Now we are <present> using Facebook's terrible <negemo> freebooting tendencies in order to
avoid <anx> copyright.
[sad present] Downright disappointing <sad> that this is <present> how it has to be.
[feel affect] It's hard <feel> to stay interested <affect> in something when I can't show that I'm making any
progress
[anger feel] F*** <anger> me for having hobbies, right? How about you, Mr. Too Cool <feel> For School?
Trait: CON; Scores: Actual = 0.81, DeepPerson = 0.86, AttRCNN = 0.33
[present work time I've <present> read <work> a little more about Model G and I still <time> have <present> to work
present work] <work> out the details, but the models and the theory make sense now.
[present present It does <present> take <present> careful study <work> and a fair amount of self-awareness to
work] confirm the results.
[present family time] Well all I really can do is go <present> to my home town and see <present> friends and family
<family>. I don't have time <time> to go on a vacation for myself.
Table 8 Examples of Concept Patterns Learned by DeepPerson

The error analysis in Figure 5 shows the importance of synsem and concept embeddings for improving

detection of all five personality traits. In order to illustrate the types of syntactic/semantic (synsem) and

concept patterns learned by DeepPerson, previously highlighted in Figure 2, we performed two additional

analyses. In the first, we identified user-trait tuples for which DeepPerson yielded accurate personality

dimension scores (averaged across all their documents) and AttRCNN had high error rates. We then

extracted key concept patterns for these users by identifying wlpHAN tokens with high attention scores in

34
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

the multi-layer concept embeddings. The results for three example users with high respective EXT, NEU,

and CON appear in Table 8. The concept pattern tags correspond to categories in LIWC. Interestingly,

many of the key concept patterns learned are consistent with those observed manually in prior text-based

personality analysis. For instance, extroverts (EXT) tend to make positive references to friends and social

processes, individuals with neuroticism (NEU) often describe their feelings and exhibit a wider range of

emotions including anger and anxiety, and those that are conscientious (CON) make references to

responsibilities and time/work related concepts (Mairesse et al. 2007).

Table 9 shows some of the most prevalent synsem patterns for these same traits. For the synsem

patterns, we added part-of-speech tag annotations ex post (using the Penn Treebank), to better illustrate the

syntactic elements of the synsem patterns. These patterns complement the concept embedding based ones.

For instance, extroverts make greater use of compound conjunctions (CC) and punctuation that allow

conveyance of additional information, neuroticism manifests in the form of greater usage of first-person

pronouns (PP), and conscientious writers make greater use of adjectives (JJ) for detail. These results

illustrate the types of personality cues learned by DeepPerson (and highlighted by wlpHAN), which relate

to ideational, textual, and interpersonal meta-functions alluded to in SFLT. Overall, the ablation and error

analysis results lend credence to the utility of our CNN-LSTM, wlpHAN and SPDFiT components, and

further highlight the overall efficacy of our DeepPerson framework. In the ensuing section, we show that

these performance deltas can also translate into downstream value in two forecasting case studies.

SynSem Patterns Example SynSem Text


Trait: EXT
[CC VBN] Feeling loved and (CC) appreciated (VBN)
[RB <p>] Having a great day so far (RB) , <p> thanks to santa paula noon meetings.
[CC RB VB] has a LONG day in the field tomorrow and (CC) then (RB) is (VB) escaping Isla Vista for the weekend
Trait: NEU
[VB PP RB] Did I piss off a gypsy? because there's a fly in my room that won't leave (VB) me (PP) alone (RB).
[PP <p><p><p>] My (PP) brain is like cake batter. (<p>) . (<p>) . (<p>) . (<p>)
[RB PP VB] hungry and got no food but it is cold out so (RB) I (PP) don't (VB) want to go out to get it!
Trait: CON
[VB DT JJ RB] Might be taking the humble food fight (VB) a (DT) little (JJ) too (RB) seriously
[VB JJ NN] Is wearing (VB) red (JJ) lipstick (NN), watching movies, and her mother screech at the family dog.
[WRB JJ DT VBZ] first day PhD applications, forgot how (WRB) challenging (JJ) this (DT) is (VBZ).
Table 9 Examples of SynSem Patterns Learned by DeepPerson

35
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

5 A Downstream Predictive Application Utilizing Detected Personality Traits

The enhanced NLP-based personality detection afforded by DeepPerson is only valuable if the generated

personality dimension variables can lead to improved descriptive insights or better predictive foresight. We

test the latter – the ability of DeepPerson generated Big-Five personality variables to improve forecasting

in financial and health contexts with implications for business analytics and policy, respectively. In this

section, we use DeepPerson to compute Big-Five personality scores for senior executives at S&P 1500

firms based on their Twitter posts. We then use these personality variables, along with other features, to

forecast future firm financial performance metrics. In a second case appearing in Appendix H, we score the

personalities of world and state-level leaders (executives) based on their tweets, and use this information to

enhance epidemiological forecasts related to the global COVID-19 pandemic.

In the remainder of this section, we demonstrate that senior executives’ personality traits derived using

DeepPerson can significantly improve our ability to predict firms’ policy and financial outcomes - relative

to existing personality methods and exclusion of personality information entirely. Such forecasts are of

interest to many stakeholder groups, including investors (FinTech) and corporate headhunters (workforce

analytics). We focus on the personality traits of senior executives who are employed by the constituent

firms of the S&P Composite 1500 Index, which encompasses large corporations, mid-size firms, and small

firms. Consistent with prior IS studies (Shi et al. 2016), we retrieved information about senior executives

at S&P-1500 firms from the company pages of CrunchBase. Using definitions (and job titles) for senior

executives as explicated in prior studies (Masli et al. 2016; Medcof 2007), we managed to gather

information related to senior executives at 425 of the S&P-1500 firms. This included names, Twitter

accounts, education levels, etc. for employees who had c-suite job titles. These senior executives’

demographic and compensation information were also retrieved from the Executive Compensation

database. Among the identified senior executives, we selected those who were employed between 1990 and

2017, and who possessed Twitter accounts, resulting in 352 executives: 219 CEOs, 40 CFOs, 22 CXOs,

188 directors, 62 presidents, and 10 chairmen. All tweets composed by the identified executives between

36
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

2006 and 2017 were retrieved. Retweeted content, URLs, and images were excluded. This resulted in an

average of 529 tweets per executive (i.e., ~ 186K data points).

Count Mean Std Min Max


Performance D&E 496 0.04 1.94 -42.77 2.33
Indicators (DVs) ROA 479 0.11 0.09 -0.62 0.45
Policy Indicators Leverage 519 0.46 0.62 -8.46 3.90
(DVs) SG&A 405 0.28 0.41 0.01 7.84
Cash Holdings (CH) 497 4.84 31.18 0.00 658.78
Interest Coverage (IC) 458 54.17 385.35 -1,692.22 6,760.74
Investment 488 0.24 0.15 0.00 1.00
Cash Flow (CF) 486 0.82 3.29 -23.59 49.23
SE Particulars Has-MBA 352 0.01 0.10 0 1
(baseline features) Income (K) 352 273.40 172.30 0.00 1,001.92
Gender 352 0.98 0.13 0 1
Age 314 63.95 8.88 42.00 87.00
LOG(Reputation) 352 2.84 2.38 0.00 11.72
EXT 352 0.36 0.20 0.00 0.80
NEU 352 0.19 0.12 0.02 0.76
CON 352 0.52 0.11 0.15 0.84
AGR 352 0.57 0.20 0.10 0.96
OPN 352 0.79 0.15 0.40 1.00
# Tweets 352 529.40 302.66 11 856
# Followers 352 11,380.46 44,364.89 4 494,000
# Favourites 352 2,402.13 17,289.02 0 297,000
Sentiment - Positive 352 0.54 0.25 0.01 1.00
Sentiment - Negative 352 0.38 0.22 0.00 1.00
Topic 1 352 0.12 0.11 0.00 0.61
Topic 2 352 0.10 0.11 0.00 0.51
Topic 3 352 0.08 0.10 0.00 0.66
Topic 4 352 0.08 0.06 0.00 0.39
Topic 5 352 0.06 0.08 0.00 0.48
Table 10 Basic Descriptive Statistics of the Variables Used in this Study

Following the experimental procedure described in Section 4, DeepPerson was fine-tuned using the

training set of the myPersonality (Facebook) corpus before it was invoked to derive the Big-Five personality

dimension scores based on executives’ Twitter posts. Prior leader personality studies note the benefits of

using models trained on larger sets of general social media data, such as the ability to use personality labels

from hundreds or thousands of users for training (Hrazdil et al. 2020). Further, prior work does not note

differences in personality trait linguistic patterns and cues based on one’s personal status or professional

standing (Mairesse et al. 2007). Consistent with prior work, we assume that personalities are relatively

stable during the aforementioned analysis period (Cobb-Clark and Schurer 2012). Following the

methodology adopted by (Bertrand and Schoar 2003), we collected annual financial indicators related to

firms’ policy and financial outcomes for 1990-2017 using the Compustat database. These indicators were

37
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

investment (INVEST), cash flow (CF), cash holdings (CH), leverage (LEVER), interest coverage (IC), the

ratio of selling, general and administrative expenses (SG&A), the ratio of dividends and earnings over

incomes (D&E), and return on asset (ROA) (Bertrand and Schoar 2003). The basic descriptive statistics of

the dependent variables and predictor variables/features used in our case study are shown in Table 10.

According to Henderson et al. (2006), senior executives usually learn and exert influence rapidly during

their initial employment period. Accordingly, we focus on examining if personalities of senior executives

may predict firms’ policy and financial outcomes during their initial tenure (i.e., short-term impact). To

measure firms’ outcomes, consistent with prior studies (Dubofsky and Varadarajan 1987; Li and Simerly

1998), we calculate the first two-year average of each chosen financial indicator after a senior executive

has joined a firm. More specifically, the average of the logarithm of the annual measures was used to reduce

skewness (Chu et al. 2013). Only those firm-year observations were retained where a single senior executive

joined the firm in each two-year observation period. This resulted in 519 total firm-executive-biennial

observations in our data set. Following Bonsall et al. (2017), we eliminated instances for a given firm or

financial DC if any of the DVs or IVs of interest were missing in that first two-year period. The DV counts

in Table 10 reflect the final number of instances incorporated.

Given our stated objective of demonstrating the utility of personality dimensions generated using

DeepPerson for predicting firms’ policy and financial outcomes, it was important to incorporate a robust

set of accompanying predictor variables (i.e., features) and forecasting models such that performance lifts

due to DeepPerson were atop reasonable baseline models. Consistent with prior work forecasting financial

measures, we used two well-known predictive regression methods well-suited for inferring non-linear

patterns: random forest regression (RFR) and gradient boosted decision trees (GBDT), both available in the

Scikit-learn package (Pedregosa et al. 2011). We formalize our prediction tasks as follows:

𝑰𝑰𝑪𝑪𝑪𝑪𝑪𝑪 𝒊𝒊𝒊𝒊 = 𝒇𝒇(𝑬𝑬𝑬𝑬𝑬𝑬, 𝑵𝑵𝑵𝑵𝑵𝑵, 𝑨𝑨𝑨𝑨𝑨𝑨, 𝑪𝑪𝑪𝑪𝑪𝑪, 𝑶𝑶𝑶𝑶𝑶𝑶, 𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩𝑩 𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭𝑭) (11)
where 𝐼𝐼𝐶𝐶𝐶𝐶𝐶𝐶 𝑖𝑖𝑖𝑖 is the logarithm of the first two-year average for each chosen financial indicator after a

senior executive has joined a firm, and 𝑓𝑓(∙) is a non-linear function capturing the relationship between the

predictor variables (i.e., personality traits and baseline features) and dependent variables (i.e., financial

38
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

indicators). For our baseline feature set, in addition to lagged (t-1) performance and (t-1) policy indicator

values as features, we also incorporated relevant lagged financial measures used in prior studies (Bonsall

et al. 2017). These included logarithms of total assets, return on assets (ROA), and cash flow (Bertrand and

Schoar 2003; Barth et al. 2001). In order to capture industry-specific variations, firm Standard Industrial

Classification (SIC) codes were included as a feature. Executives’ personal characteristics used in prior

studies were also incorporated as features, including: age, gender, income, education level, and reputation

(Bertrand and Schoar 2003; Brick et al. 2006; Weng and Chen 2017). Adapting the methodology proposed

by (Weng and Chen 2017), reputation was estimated by counting the frequency of appearance of the

executive’s name in news articles retrieved from Google. In order to account for baseline semantic

information embedded in executives' tweet text, we also included the sentiment of the tweets given by

LIWC as well as their top-10 topics extracted using Latent Dirichlet Allocation (i.e., from the document-

topic vector) (Blei et al. 2003). We report the statistic of top-5 topics on Table 10. Finally, basic social

media-based features such as the number of tweets, followers, and favorites were also included.

We ran the aforementioned regression models either with or without the DeepPerson personality

dimensions as features. The models devoid of personality features included all other variables discussed

(i.e., financial, personal, and social media sentiment/topic). We also compared performance using

personality dimensions generated with DeepPerson relative to methods benchmarked earlier in our design

evaluation: CNN-1, CNN-2, and PersonaBERT. In all experiments, the widely-used mean square error

(MSE) and mean absolute error (MAE) metrics were employed to measure predictive power. The

improvement in performance brought about by inclusion of personality features was once again computed
𝑀𝑀𝑀𝑀𝐸𝐸𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏𝑏 −𝑀𝑀𝑀𝑀𝐸𝐸𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
as follows: 𝐼𝐼𝐼𝐼𝐼𝐼 = × 100%. Consistent with our design evaluation, all models
𝑀𝑀𝑀𝑀𝐸𝐸𝑏𝑏𝑏𝑏𝑏𝑏𝑒𝑒𝑙𝑙 𝑖𝑖𝑖𝑖𝑖𝑖

were trained on a training split and tested on subsequent instances. Once again, the non-parametric

Wilcoxon signed-rank test was used to examine statistical significance.

Tables 11 and 12 show the percentage improvements in MSE and MAE, respectively, and statistical

significances when adding DeepPerson-based personality features to the baseline feature set devoid of

39
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

personality information, as well as the results when using CNN-1, CNN-2, and PersonaBERT Big Five

personality features. The tables report results for GBDT and RFR each run with 20 estimators. In general,

the inclusion of the DeepPerson-based personality dimension features improves MSE or MAE by 4% to

15% for each of the 8 possible dependent variables (6 policy indicators and 2 performance indicators). The

average improvements using DeepPerson are in the 6.1% to 14.3% range across the two models and

MSE/MAE metrics. Performance gains for all 8 dependent variables attributable to inclusion of the five

DeepPerson-based personality dimensions were significant (p-values < 0.05). These results suggest that the

personality measures derived using DeepPerson can enhance predictive power in firm policy and

performance forecasting contexts. Next, when comparing the results for DeepPerson-based personality

dimensions versus those derived using comparison detection methods such as CNN-1, CNN-2, and BERT,

there are three important takeaways worth highlighting. First, the RFR and GBDT models using personality

features derived via DeepPerson improve MSE and MAE by an average of 4% to 14% over the comparison

methods. Second, among the three benchmark comparison methods, features generated using BERT and

CNN-1 improve average results across the eight firm policy and performance prediction tasks (with average

lifts of 2% to 8%). However, on average, the use of CNN-2 garners little to no improvement. Although

CNN-2 enhances forecasting of performance indicators, it markedly underperforms on policy indicators.

Models Policy Indicators Performance Indicators Ave.


CH CF INVEST LEVER IC SG&A D&E ROA
RFR
DeepPerson 3.23 ∗∗ 3.15 ∗ 8.61 ∗ 5.88 ∗ 6.47 ∗∗ 10.78 ∗∗ 8.53 ∗∗ 6.73 ∗∗ 6.67
CNN-1 -0.21 -0.92 5.23 ∗ 3.80 ∗ 2.60 2.36 3.05 4.85 2.60
CNN-2 2.64 ∗ -6.67 3.95 3.04 -6.93 -3.19 5.16 ∗ 6.34 ∗∗ 0.54
PersonaBERT 2.53 ∗ 1.77 4.74 ∗ 4.26∗∗ 3.94∗ 5.68∗ 3.18 5.35 ∗ 3.93
GBDT
DeepPerson 20.91 ∗∗ 7.49 ∗∗ 11.91 ∗∗ 12.95 ∗∗ 8.65 ∗∗ 31.74 ∗∗ 8.13 ∗∗ 12.52 ∗∗ 14.29
CNN-1 12.57 ∗∗ 1.38 4.4 3.87 0.15 1.84 0.21 2.63 3.38
CNN-2 1.63 -8.61 0.88 -3.36 -9.27 -10.11 6.99∗ 7.44 ∗ -1.80
BERT 10.28 ∗∗ 5.62 ∗∗ 6.72 ∗ 6.93 ∗ 4.58 20.34 ∗∗ 5.51 8.97 ∗ 8.62
ARIMA -9.96 -22.75 -9.54 -26.90 -23.20 -7.67 -8.92 -27.48 -17.05
Notes. Each value is a percentage. For each regression model and financial indicator, we estimate the average improvement with
or without incorporating senior executives’ personality traits into the model. The ARIMA row shows the possible improvement for
GBDT relative to the common time-series prediction model (ARIMA). DeepPerson, CNN-1, CNN-2, and BERT refer to predictions
using executives’ personality traits detected by the respective methods. Wilcoxon signed-rank test with 𝑝𝑝 < .01 (∗∗), 𝑝𝑝 < .05 (∗).

Table 11 Percentage Improvement in Performance (MSE) Across Different Personality Detectors

40
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

Models Policy Indicators Performance Indicators Ave.


CH CF INVEST LEVER IC SG&A D&E ROA
RFR
DeepPerson 4.05 ∗∗ 3.39 ∗ 6.63 ∗∗ 5.16 ∗ 4.98 ∗ 8.15 ∗∗ 7.41 ∗∗ 9.00 ∗∗ 6.10
CNN-1 0.36 -1.83 4.84 ∗ 2.60 0.92 2.15 ∗ 1.14 3.09 1.66
CNN-2 3.51 ∗ -4.59 3.90 ∗ 2.05 -4.27 -1.07 6.43 ∗∗ 8.03 ∗ 1.75
BERT 3.91 ∗ 1.85 4.44 ∗ 3.98 ∗ 2.10 ∗ 3.54 ∗ 3.51 3.89 ∗∗ 3.40
GBDT
DeepPerson 10.22 ∗∗ 6.40 ∗∗ 8.31 ∗∗ 7.10 ∗ 5.27 ∗∗ 18.02 ∗∗ 8.55 ∗∗ 11.78 ∗∗ 9.46
CNN-1 3.58 ∗ 1.54 4.93 ∗ 0.63 0.60 3.92 ∗ 0.19 3.04 2.30
CNN-2 1.36 -7.09 1.80 -1.89 -6.44 -11.75 7.89 ∗ 5.75 ∗ -1.30
BERT 2.67 ∗ 3.66 ∗∗ 6.43 ∗ 2.90 ∗ 4.50 12.88 ∗∗ 6.33 6.64 ∗ 5.75
ARIMA -8.98 -16.87 -8.64 -16.85 -11.61 -7.75 -5.67 -20.55 -12.11
Table 12 Percentage Improvement in Performance (MAE) Across Different Personality Detectors

Third, we also comparatively evaluated the classical ARIMA model widely used in predicting financial

time series data (Mohamed et al. 2010). Similar to GBDT and RFR, ARIMA parameters were tuned

extensively, including the order of the auto-regressive function, the differentiation term, and the order of

the moving average. The last row of Tables 11 and 12 shows the MSE and MAE score percentages for

ARIMA relative to the GBDT-DeepPerson model. ARIMA had significantly lower results across all 8 firm

policy and performance indicators, with almost 17% worse MSE as a whole (all p-values < 0.05). While

these results were using cross-validation, we also performed a single chronological training-testing split as

a robustness check. Those results, in Appendix G, are consistent with results appearing here. Collectively,

these results further underscore the value of the personality dimensions derived using DeepPerson.

As a robustness check, we repeated the empirical case study using only executives and data from the

S&P-500 and garnered similar results. We also examined the impact of specific Big-Five dimensions as

features to see which traits are the strongest predictors. We also conducted a sensitivity analysis to evaluate

the minimal number of executives’ tweets required to produce significant prediction improvement. These

results appear in Appendices C, D, and E, respectively. In Appendix F, we show that these downstream

results also hold for DeepPerson ablation settings examined in Sections 4.3 and 4.4. As noted earlier, a

second downstream predictive application of DeepPerson in the context of COVID-19 forecasting appears

in Appendix H. Collectively, our results show that downstream forecasting models utilizing personality

dimensions scored by DeepPerson can dramatically enhance their results, whereas this is not the case when

using benchmark personality detectors or classic time series forecasting methods. As shown in the user-

41
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

level results in Appendix B, personality scores generated with DeepPerson are better correlated with survey-

based personality measurements relative to comparison methods. The imprecision of comparison text-based

personality detection methods may lead to incorrect personality traits (i.e., noisy features). It is generally

believed that noisy features tend to jeopardize the performance of a prediction model (John et al. 1994). In

other words, the design evaluation deltas reported in the prior section do translate into operational utility in

the form of better foresight in an important business analytics context.

6 Results Discussion, Limitations, and Concluding Remarks

From a design science perspective, we make three contributions. First, we propose a novel DeepPerson

framework that makes personality detection from text possible, practical, and valuable. Second, as part of

our framework, we propose two novel machine learning artifacts, namely the self-taught personality

detection fine-tuning (SPDFiT) transfer learning approach, and the word-layer-person attention network.

Third, through a robust design evaluation and two case studies, we offer empirical insights on the extent of

operational utility afforded by DeepPerson and its key components, including for downstream forecasting

tasks in financial and health contexts. Our results also have at least four important implications for IS

research and practice.

1) Debunking the “Brute Force AI” Fallacy – In recent years, with the rise of Big Data and cloud

computing, it has been suggested that large-scale deep learning models encompassing billions of parameters

tuned using millions of documents can address most NLP problems. The idea that such generic language

models are “all you need” has been perpetuated by industry research related to powerful artifacts such as

BERT and GPT-3 (Devlin et al. 2019; Brown et al. 2020). However, due to the pace of change and lack of

thorough benchmarking, the efficacy and utility of such artifacts for a breadth of NLP tasks might be

overstated (Zimbra et al. 2018). Our findings suggest that not only are such language models markedly less

effective for personality detection than DeepPerson, they are often unable to offer statistical or practical

significance for downstream forecasting contexts. This is consistent with recent studies that have warned

generic language models are like “stochastic parrots” that might be getting too big by over relying on the

sheer number of word tokens used during pre-training (Bender et al. 2021). Case in point, BERT-Base and

42
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

PersonaBERT relied on 3.3 and 4.1 billion tokens, respectively, whereas DeepPerson only used 800 million.

RoBERTa used ten times as much data as BERT (an estimate 30 billion-plus tokens). As we foreshadowed

earlier, we believe the demise of artifacts grounded in principled domain adaption has been overstated.

2) Design Science as a Mechanism for Middle-ground Frameworks – In contexts where limited labeled

data related to the target task is available, brute force learning strategies are less effective. In such cases,

representation engineering that adapts machine learning artifacts such as encoders, embeddings, attention

mechanisms, and custom transfer learning schemes can present opportunities for effective domain

adaptation (Abbasi et al. 2019). By serving as a mechanism for balancing the tradeoffs between data and

intuition, socio and technical factors, inductive versus deductive insights, and general versus domain-

specific learning, design science represents a robust approach for developing middle-ground frameworks

that harness the power of human cumulative tradition in concert with powerful artificial intelligence.

3) The Importance of Personality for Predicting Policy – We show that when done correctly,

personality dimensions can improve our foresight related to prediction of policy indicators and outcomes.

The inclusion of personality measures derived by DeepPerson enhanced forecasts for financial policy

indicators by 6 to 14 percentage points on average. Similarly, DeepPerson attained the biggest lifts for

health pandemic forecasting relative to alternative epidemiological and data-driven models examined (see

Appendix H). Recently, many predictive analytics researchers have noted the challenges related to

forecasting complex policy-related outcomes, including noisy input data and the need for a diversity of

models (Hutson, 2020; Bertozzi et al. 2020). Our results suggest that the traits of leaders tasked with

informing policy-related decisions might be another important input for such models. In addition to

influencing decisions directly, leaders’ traits may often reflect the characteristics of the organizations or

populations they lead and represent – for example, advisory boards and employees in firms, or the general

public and government in states and countries (Hambrick 2007). Whereas the reverse causal relationship

between leader personality and outcomes of organizations might be debated in empirical causal inference

studies, in prediction contexts (Shmueli and Koppius 2011), our study suggests that the personality of

43
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

executives might serve as a rich low-dimensional feature representation for forecasting policy-related

indicators and outcomes.

4) Towards Proactive Personalization – Accurate automated personality detection has important

implications for the broader movement towards “proactive personalization.” In personalized marketing,

personality information can enrich predictive models related to various stages of the customer lifecycle

including acquisition, retention, and expansion (Gupta et al. 2006; Brown et al. 2015). As cybersecurity

moves from reactive to proactive, personality measures could enhance predictive user models in human-in-

the-loop frameworks (Parrish et al. 2009; Bravo-Lillo et al. 2010). In human capital management contexts,

workforce analytics models already leveraging survey-based personality measures could be made timelier

with NLP-based personality scores (Ryan and Herleman 2015). In precision medicine, with the trend

towards public health 3.0 (DeSalvo et al. 2017), personality information can help better align preventative

interventions with individual patient characteristics (Friedman 2000). For instance, the conscientiousness

trait has been found to be predictive of health and longevity, from childhood to old age (Friedman et al.

2014). Higher extraversion is linked to greater likelihood of seeking preventative screenings (Aschwanden

et al. 2019). Lower conscientiousness and high neuroticism have been associated with greater vaccine

hesitancy (Murphy et al. 2021; Aschwanden et al. 2021). Personality could provide a mechanism for

measuring heterogeneity in user intent (Ahmad et al. 2022). NLP-based personality detection could inform

various such proactive intervention personalization use cases.

Our work is not without its limitations. Bias is an important consideration for NLP models (Lalor et al.

2022). Furthermore, future work on personality across languages, and using multimedia input including

audio and video, would be beneficial. Our design evaluation focused on social media postings, forum

messages, and lengthier texts (essays). Other relevant documents might warrant exploration, including

speech transcripts and written articles. Nevertheless, we believe this work has important implications for

research at the intersection of design and data science that integrates social-technical concepts into novel

domain-adapted machine learning artifacts, and for practitioners that enable, produce, or consume

predictive analytics where the inclusion of personality information may enhance insight and foresight.

44
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

7 References

Abbasi, A., and Chen, H. 2008. CyberGate: A Design Framework and System for Text Analysis of Computer-mediated
Communication. MIS Quarterly, 32(4): 811-837.
Abbasi, A., Zhou, Y., Deng, S., and Zhang, P. 2018. Text Analytics to Support Sense-Making in Social Media: A Language-Action
Perspective. MIS Quarterly 42(2): 427–64.
Abbasi, A., Sarker, S., and Chiang, R. H. 2016. Big Data Research in Information Systems: Toward an Inclusive Research
Agenda. Journal of the Association for Information Systems, 17(2), 3.
Abbasi, A., Kitchens, B., and Ahmad, F. 2019. The Risks of AutoML and How to Avoid Them, Harvard Business Review, digital
article: https://hbr.org/2019/10/the-risks-of-automl-and-how-to-avoid-them
Adamopoulos, P., Ghose, A., and Todri, V. 2018. The Impact of User Personality Traits on Word of Mouth: Text-Mining Social
Media Platforms, Information Systems Research 29 (3): 612–40.
Agastya, I. M. A., Handayani, D. O. D., and Mantoro, T. 2019. A Systematic Literature Review of Deep Learning Algorithms for
Personality Trait Recognition. In 5th Intl. Conf. on Computing Engineering and Design, 1-6.
Ahmad, F., Abbasi, A., Li, J., Dobolyi, D. G., Netemeyer, R., Clifford, G., and Chen, H. 2020. A Deep Learning Architecture for
Psychometric Natural Language Processing, ACM Trans. on Information Systems 38(1), no. 6.
Ahmad, F., Abbasi, A., Kitchens, B., Adjeroh, D. A., and Zeng, D. 2022. Deep Learning for Adverse Event Detection from Web
Search. IEEE Transactions on Knowledge and Data Engineering, forthcoming.
Ahmad H., Asghar M. Z., Khan A. S., and Habib A. 2020b. A Systematic literature review of personality trait classification from
textual content, Open Computer Science 10:175–193.
Alam, F., Stepanov, E. A., and Riccardi, G. 2013. Personality Traits Recognition on Social Network-Facebook. In Seventh
International Aaai Conference on Weblogs and Social Media, 1–4.
Arazy, O., Kumar, N., and Shapira, B. 2010. A Theory-driven Design Framework for Social Recommender Systems, Journal of
the Association for Information Systems, 11(9), 2.
Aschwanden, D., Gerend, M. A., Luchetti, M., Stephan, Y., Sutin, A. R., and Terracciano, A. 2019. Personality traits and preventive
cancer screenings in the Health Retirement Study. Preventive medicine, 126, 105763.
Aschwanden, D., Strickhouser, J. E., Sesker, A. A., Lee, J. H., Luchetti, M., ... and Terracciano, A. 2021. Psychological and
behavioural responses to coronavirus disease 2019: The role of personality. European Journal of Personality, 35(1), 51-66.
Back, M. D., Stopfer, J. M., Vazire, S., Gaddis, S., Schmukle, S. C., Egloff, B., and Gosling, S. D.. 2010. Facebook Profiles Reflect
Actual Personality, Not Self-Idealization. Psychological Science 21 (3): 372–74.
Barth, M. E., Cram, D. P., and Nelson, K. K.. 2001. Accruals and the Prediction of Future Cash Flows. The Accounting Review 76
(1): 27–58.
Beltagy, I., Lo, K., and Cohan, A. 2019. SciBERT: A Pretrained Language Model for Scientific Text, In Proceedings of the
Conference on Empirical Methods in Natural Language Processing, pp. 3615-3620.
Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. 2021. On the Dangers of Stochastic Parrots: Can Language
Models Be Too Big? In Proceedings of the ACM Conference on Fairness, Accountability, and Transparency, 610-623.
Bertozzi, A. L., Franco, E., Mohler, G., Short, M. B., and Sledge, D. 2020. The challenges of modeling and forecasting the spread
of COVID-19. Proceedings of the National Academy of Sciences, 117(29), 16732-16738.
Bertrand, M., and Schoar, A.. 2003. Managing with Style: The Effect of Managers on Firm Policies. The Quarterly Journal of
Economics 118 (4): 1169–1208.
Blei D. M., Ng A. Y., and Jordan, M. I. 2003. Latent Dirichlet Allocation, Journal of Machine Learning Research 3: 993-1022.
Bonsall, S. B., Holzman, E. R., and Miller, B. P. 2017. Managerial Ability and Credit Risk Assessment. Management Science,
63(5): 1425-1449.
Bravo-Lillo, C., Cranor, L. F., Downs, J., and Komanduri, S. 2010. Bridging the Gap in Computer Security Warnings: A Mental
Model Approach, IEEE Security & Privacy, 9(2), 18-26.
Brown, D. E., Abbasi, A., and Lau, R. Y. 2015. Predictive analytics: Predictive modeling at the micro level. IEEE Intelligent
Systems, 30(3), 6-8.
Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... and Agarwal, S. 2020. Language Models are Few-
shot Learners. arXiv preprint arXiv:2005.14165.
Brick, I. E., Palmon, O., and Wald, J. K.. 2006. CEO Compensation, Director Compensation, and Firm Performance: Evidence of
Cronyism? Journal of Corporate Finance 12 (3): 403–23.
Celli, F., Pianesi, F., Stillwell, D., Kosinski, M, and others. 2013. Workshop on Computational Personality Recognition (Shared
Task). In Proc. 7th International AAAI Conference on Weblogs and Social Media, 2–5.
Chelba, C., Mikolov, T., Schuster, M., Ge, Q., Brants, T., and Koehn, P.. 2013. One Billion Word Benchmark for Measuring
Progress in Statistical Language Modeling. CoRR abs/1312.3005. http://arxiv.org/abs/1312.3005.
Chen, D., Wang, W. 0028, Gao, W. 0008, and Zhou, Z. 2018. Tri-Net for Semi-Supervised Deep Learning. In Proc. 27th
International Joint Conference on Artificial Intelligence, 2014–20. Stockholm, Sweden: AAAI.
Chen, H., Chiang, R. HL, and Storey, V. C. 2012. Business Intelligence and Analytics: From Big Data to Big Impact. MIS Quarterly
36 (4): 1165–88.
Cheng J., Zhao S., Zhang J., King I., Zhang X., and Wang H. 2017. Aspect-level Sentiment Classification with Heat (Hierarchical
Attention) Network. In Proc. ACM Conf. on Information and Knowledge Management, 97-106.

45
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

Chu, C. I., Chatterjee, B., and Brown, A. 2013. The Current Status of Greenhouse Gas Reporting by Chinese Companies,
Managerial Auditing Journal 28 (2): 114–39.
Cobb-Clark, D., and Schurer, S. 2012. The Stability of Big-Five Personality Traits. Economics Letters 115: 11–15.
Crayne, M. P., and Medeiros, K. E. 2020. Making Sense of Crisis: Charismatic, Ideological, and Pragmatic Leadership in Response
to Covid-19. The American Psychologist.
DeSalvo, K. B., Wang, Y. C., Harris, A., Auerbach, J., Koo, D., and O’Carroll, P. 2017. Public Health 3.0: A Call to Action for
Public Health to Meet the Challenges of the 21st Century. Preventing Chronic Disease, 14.
Devaraj, S., Easley, R.F. and Crant, J.M. 2008. Research note—how does personality matter? Relating the five-factor model to
technology acceptance and use. Information Systems Research 19(1): 93-105.
Devlin, J., Chang, M., Lee, K., and Toutanova, K.. 2019. BERT: Pre-Training of Deep Bidirectional Transformers for Language
Understanding. In Proc. 2019 Conference of the NAACL-HLT, 4171–4186.
Dubofsky, P., and Varadarajan, P. R. 1987. Diversification and Measures of Performance: Additional Empirical Evidence.
Academy of Management Journal 30 (3): 597–608.
Edunov, S., Ott, M., Auli, M., and Grangier, D. 2018. Understanding Back-Translation at Scale. In Proceedings of the 2018
Conference on Empirical Methods in Natural Language Processing, 489-500.
Farnadi, G., Zoghbi, S., Moens, M. F., and Cock, M. D. 2013. Recognising Personality Traits Using Facebook Status Updates. In
Seventh International AAAI Conference on Weblogs and Social Media, 14–18.
Feng, S., Wang, Y., Liu, L., Wang, D., and Yu, G. 2019. Attention Based Hierarchical Lstm Network for Context-Aware Microblog
Sentiment Classification. World Wide Web 22 (1): 59–81.
Friedman, H. S. 2000. Long‐term Relations of Personality and Health: Dynamisms, Mechanisms, Tropisms. Journal of Personality,
68(6), 1089-1107.
Friedman, H. S., and Kern, M. L. 2014. Personality, well-being, and health. Annual Review of Psychology, 65, 719-742.
Gal, Y., Islam, R., and Ghahramani, Z. 2017. Deep bayesian active learning with image data. In International Conference on
Machine Learning (pp. 1183-1192). PMLR.
Galassi A., Lippi M., and Torroni P. 2020. Attention in Natural Language Processing, IEEE Transactions on Neural Networks and
Learning Systems 1-18.
Gao, S., Ramanathan, A., and Tourassi, G.. 2018. Hierarchical Convolutional Attention Networks for Text Classification. In
Proceedings of the Third Workshop on Representation Learning for Nlp, 11–23.
Gill, A. J., and Oberlander, J.. 2003. Perception of E-Mail Personality at Zero-Acquaintance: Extraversion Takes Care of Itself;
Neuroticism Is a Worry. In Proc. of the Cognitive Science Society, 25:456–61. 25.
Gjurković M., Karan M., Vukojević I., Bošnjak M., and Šnajder J. 2021. PANDORA Talks: Personality and Demographics on
Reddit, Proceedings of the Ninth International ACL Workshop on Natural Language Processing for Social Media, 138–152.
Go, A., Bhayani, R., and Huang, L. 2009. Twitter Sentiment Classification Using Distant Supervision. CS224N Project, Stanford.
https://cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf.
Goldberg, L. R. 1990. An Alternative Description of Personality: The Big-Five Factor Structure. Journal of Personality and Social
Psychology 59 (6): 1216–29.
Gregor, S., and Hevner, A. 2013. Positioning and Presenting Design Science Research for Maximum Impact. MIS Quarterly 37
(2): 337–55.
Guan Z., Wu B., Wang B., and Liu H. 2020. Personality2vec: Network Representation Learning for Personality, In 2020 IEEE
Fifth International Conference on Data Science in Cyberspace (DSC) 30-37.
Guest, J. L., Rio, C. D., and Sanchez, T. 2020. The Three Steps Needed to End the Covid-19 Pandemic: Bold Public Health
Leadership, Rapid Innovations, and Courageous Political Will. JMIR Public Health 6 (2): e19043.
Gupta, S., Hanssens, D., Hardie, B., Kahn, W., Kumar, V., Lin, N., ... and Sriram, S. 2006. Modeling customer lifetime value.
Journal of Service Research, 9(2), 139-155.
Halliday, M. A. K., and Hasan, R. 2004. An Introduction to Functional Grammar, 3rd ed., revised by C. Matthiessen.
Hambrick, D. C. 2007. Upper Echelons Theory: An Update. Academy of Management Review 32 (2): 334–43.
Hambrick, D. C., and Mason, P. A. 1984. Upper Echelons: The Organization as a Reflection of Its Top Managers. Academy of
Management Review 9 (2): 193–206.
Hastie T., Tibshirani R., and Friedman J. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction.
Springer Science & Business Media.
Haussler D, Kearns M and Schapire RE 1994 Bounds on the sample complexity of Bayesian learning using information theory and
the VC dimension. Machine Learning (14): 83–113.
Heavey, C., Simsek, Z., Kyprianou, C., and Risius, M. 2020. How Do Strategic Leaders Engage with Social Media? A theoretical
framework for research and practice. Strategic Management Journal 41 (8): 1490–1527.
Henderson, A. D., Miller, D., and Hambrick, D. C. 2006. How Quickly Do CEOs Become Obsolete? Industry Dynamism, CEO
Tenure, and Company Performance. Strategic Management Journal 27 (5): 447–60.
Hevner, A., March, S., Park, J., and Ram, S. 2004. Design Science in IS Research, MIS Quarterly 28 (1): 75–105.
Hough, J. R., and Ogilvie, OT. 2005. An Empirical Test of Cognitive Style and Strategic Decision Outcomes. Journal of
Management Studies 42 (2): 417–48.
Howard, J., and Ruder, S. 2018. Fine-Tuned Language Models for Text Classification. CoRR abs/1801.06146.
Hrazdil, K., Novak, J., Rogo, R., Wiedman, C., and Zhang, R. 2020. Measuring executive personality using machine-learning
algorithms: A new approach and validation tests. Journal of Business Finance and Accounting.

46
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

Huang, A., Wang, H., and Yang, Y. 2020. FinBERT—A Deep Learning Approach to Extracting Textual Information, Available at
SSRN, 3910214.
Hutson, M. 2020. The mess behind the models: Too many of the COVID-19 models led policymakers astray. Here's how
tomorrow's models will get it right. IEEE Spectrum, 57(10), 30-35.
Iacobelli, F., Gill, A. J., Nowson, S., and Oberlander, J. 2011. Large Scale Personality Classification of Bloggers. In Affective
Computing and Intelligent Interaction, 568–77. Springer.
Jayaratne, M., and Jayatilleke, B. 2020. Predicting Personality using Answers to Open-ended Interview Questions, IEEE Access, 8,
115345-115355.
Jing, R. 2019. A Self-Attention Based LSTM Network for Text Classification. Journal of Physics, 1207:012008. 1.
John GH, Kohavi R, and Pfleger K (1994) Irrelevant features and the subset selection problem. In Proceedings of the Eleventh
International Conference on Machine Learning, 121-129.
Judge, T. A., Bono, J. E., Ilies, R., and Gerhardt, M. W. 2002. Personality and Leadership: A Qualitative and Quantitative Review.
Journal of Applied Psychology 87 (4): 765–80.
Judge, T. A., Piccolo, R. F., and Kosalka, T. 2009. The Bright and Dark Sides of Leader Traits: A Review and Theoretical Extension
of the Leader Trait Paradigm. The Leadership Quarterly 20 (6): 855–75.
Kim, Y., Jernite, Y., Sontag, D., and Rush, A. M. 2016. Character-Aware Neural Language Models. In Proceedings of the Thirtieth
Aaai Conference on Artificial Intelligence, 2741–9.
Laine S. and Aila T. 2016. Temporal Ensembling for Semi-supervised Learning. arXiv preprint arXiv:1610.02242.
Lalor, J. P., Yang, Y., Smith, K., Forsgren, N., and Abbasi, A. 2022. Benchmarking Intersectional Biases in NLP. In Proceedings
of the Association for Computational Linguistics.
Le, Q., and Mikolov, T. 2014. Distributed Representations of Sentences and Documents. In International Conference on Machine
Learning, 32:1188–96.
Lee D. H. 2013. Pseudo-label: The simple and Efficient Semi-supervised Learning Method for Deep Neural Networks, In ICML
Workshop on Challenges in Representation Learning.
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., and Kang, J. 2020. BioBERT: a pre-trained biomedical language
representation model for biomedical text mining, Bioinformatics, 36(4), 1234-1240.
Leonardi S., Monti D., Rizzo G., and Morisio, M. 2020. Multilingual Transformer-Based Personality Traits Estimation,
Information 11(4): 179.
LePine J. A. and Van Dyne L. 2001. Voice and cooperative behavior as contrasting forms of contextual performance: evidence of
differential relationships with big five personality characteristics and cognitive ability. Journal of applied psychology, 86(2):
326.
Li, M., and Simerly, R. L. 1998. The Moderating Effect of Environmental Dynamism on the Ownership and Performance
Relationship. Strategic Management Journal 19 (2): 169–79.
Li, J., Larsen, K., and Abbasi, A. 2020. TheoryOn: A Design Framework and System for Unlocking Behavioral Knowledge through
Ontology Learning, MIS Quarterly, 44(4), 1733-177.
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... and Stoyanov, V. 2019. RoBERTa: A Robustly Optimized BERT
Pretraining Approach. arXiv preprint arXiv:1907.11692.
Liu Z., Wang, Y., Mahmud, J., Akkiraju, R., Schoudt, J., Xu, A., and Donovan, B. 2016. To Buy or Not to Buy? Understanding
the Role of Personality Traits in Predicting Consumer Behaviors. In Spiro E., Ahn YY. (eds) Social Informatics. Lecture
Notes in Computer Science, 10047: 337–346.
Lynn V., Balasubramanian N., and Schwartz H. A. 2020. Hierarchical Modeling for User Personality Prediction: The Role of
Message-Level Attention. In Proc. 58th Annual Meeting of the ACL, 5306–5316.
Mairesse, F., Walker, M. A., Mehl, M. R., and Moore, R. K. 2007. Using Linguistic Cues for the Automatic Recognition of
Personality in Conversation and Text. Journal of Artificial Intelligence Research 30: 457–500.
Majumder, N., Poria, S., Gelbukh, A., and Cambria, E. 2017. Deep Learning-Based Document Modeling for Personality Detection
from Text. IEEE Intelligent Systems 32 (2): 74–79.
Masli, A., Richardson, V. J., Watson, M. W., and Zmud, R. W. 2016. Senior Executives’ IT Management Responsibilities: Serious
IT-Related Deficiencies and CEO/CFO Turnover. MIS Quarterly 40 (3): 687–708.
Medcof, J. W. 2007. CTO Power. Research-Technology Management 50 (4): 23–31.
Mehl, M. R., Gosling, S. D., and Pennebaker, J. W. 2006. Personality in Its Natural Habitat: Manifestations and Implicit Folk
Theories of Personality in Daily Life. Journal of Personality and Social Psychology 90 (5): 862.
Mehta, Y., Majumder, N., Gelbukh, A., and Cambria, E. 2020. Recent Trends in Deep Learning Based Personality Detection,
Artificial Intelligence Review, 53(4), 2313-2339.
Mohamed, N., Ahmad, M. H., Ismail, Z., and others. 2010. Short Term Load Forecasting Using Double Seasonal Arima Model. In
Proceedings of the Regional Conference on Statistical Sciences, 10:57–73.
Murphy, J., Vallières, F., Bentall, R. P., Shevlin, M., McBride, O., Hartman, T. K., ... and Hyland, P. 2021. Psychological
characteristics associated with COVID-19 vaccine hesitancy and resistance in Ireland and the United Kingdom. Nature
Communications, 12(1), 1-15.
Nadkarni, S., and Herrmann, POL. 2010. CEO Personality, Strategic Flexibility, and Firm Performance: The Case of the Indian
Business Process Outsourcing Industry. Academy of Management Journal 53 (5): 1050–73.
Nygren, T. E., and White, R. J. 2005. Relating Decision Making Styles to Predicting Selfefficacy and a Generalized Expectation
of Success and Failure. In Proc. Human Factors and Ergonomics Society Meeting, 432–34.

47
Yang, Lau, and Abbasi - Forthcoming in Information Systems Research (ISR), 2022

Pan, S. J., and Yang, Q. 2009. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22 (10):
1345–59.
Parrish Jr, J. L., Bailey, J. L., and Courtney, J. F. 2009. A Personality Based Model for Determining Susceptibility to Phishing
Attacks. Little Rock: University of Arkansas, 285-296.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., and Blondel, M. 2011. Scikit-Learn: Machine
Learning in Python. Journal of Machine Learning Research 12: 2825–30.
Pennebaker, J. W., and King, L. A. 1999. Linguistic Styles: Language Use as an Individual Difference. Journal of Personality and
Social Psychology 77 (6): 1296.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., and Zettlemoyer, L. 2018. Deep Contextualized Word
Representations. In Proceedings of NAACL-HLT, 2227–37.
Peterson, R. S., Smith, D. B., Martorana, P. V., and Owens, P. D. 2003. The Impact of Chief Executive Officer Personality on Top
Management Team Dynamics: One Mechanism by Which Leadership Affects Organizational Performance. Journal of
Applied Psychology 88 (5): 795–808.
Pratama, B. Y., and Sarno, R. 2015. Personality Classification Based on Twitter Text Using Naive Bayes, KNN and SVM. In The
IEEE International Conference on Data and Software Engineering (Icodse), 170–74. IEEE.
Prechelt, L. 1998. Automatic Early Stopping Using Cross Validation, Neural Networks 11 (4): 761–67.
Riaz, M. N., Riaz, M. A., and Batool, N. 2012. Personality Types as Predictors of Decision Making Styles. Journal of Behavioural
Sciences 22 (2): 99–114.
Ryan, J., and Herleman, H. 2015. A Big Data Platform for Workforce Analytics. In Big Data at Work: The Data Science Revolution
and Organizational Psychology, Chapter 2, 19-42.
Shi, Z., Lee, G. M., and Whinston, A. B. 2016. Toward a Better Measure of Business Proximity: Topic Modeling for Industry
Intelligence. MIS Quarterly 40 (4): 1035–56.
Shmueli, G. and Koppius, O. 2011. Predictive Analytics in Information Systems Research. MIS Quarterly, 553–72.
Sun X., Liu B., Cao J., Luo J., and Shen X. 2018. Who am I? Personality Detection Based on Deep Learning for Texts, IEEE
International Conference on Communications: 1–6.
Tadesse, M. M., Lin, H., Xu, B., and Yang, L. 2018. Personality Predictions Based on User Behavior on the Facebook Social Media
Platform. IEEE Access 6: 61959–69.
Tausczik, Y. R., and Pennebaker, J. W. 2010. The Psychological Meaning of Words: LIWC and Computerized Text Analysis
Methods. Journal of Language and Social Psychology 29 (1): 24–54.
Torrey, L., and Shavlik, J. 2010. Transfer Learning. In Handbook of Research on Machine Learning Applications and Trends:
Algorithms, Methods, and Techniques, 242–64. IGI Global.
Vinciarelli, A., and Mo, G. 2014. Survey of Personality Computing. IEEE Trans. Affective Computing 5: 273–91.
Walls, J. G., Widmeyer, G. R., and El Sawy, O. A. 1992. Building an Information System Design Theory for Vigilant
EIS, Information Systems Research, 3(1), 36-59.
Wang, Q., Lau, R. Y. K., and Xie, H. 2021. The Impact of Social Executives on Firms’ Mergers and Acquisitions Strategies: A
Difference-in-Difference Analysis. Journal of Business Research 123: 343–354.
Wang Z., Wu C. H., Li Q. B., Yan B., and Zheng, K. F. 2020. Encoding Text Information with Graph Convolutional Networks for
Personality Recognition, Applied Sciences (Switzerland) 10(12): 4081.
Wang Z., Wu C., Zheng K., Niu X., and Wang X. 2019. SMOTETomek-Based Resampling for Personality Recognition. IEEE
Access 7: 129678–129689.
Wang Y., Chen Q., Ahmed M., Li Z., Pan W., and Liu H. 2019. Joint Inference for Aspect-level Sentiment Analysis by Deep
Neural Networks and Linguistic Hints, IEEE Transactions on Knowledge and Data Engineering 1-12.
Weng, P. S., and Chen, W. Y. 2017. Doing Good or Choosing Well? Corporate Reputation, CEO Reputation, and Corporate
Financial Performance. The North American Journal of Economics and Finance 39: 223–40.
Wilcoxon, F. 1992. Individual Comparisons by Ranking Methods. Breakthroughs in Statistics, 196–202. Springer.
Wright, W. R., and Chin, D. N. 2014. Personality Profiling from Text: Introducing Part-of-Speech N-Grams. In International
Conference on User Modeling, Adaptation, and Personalization, 243–53. Springer.
Xie Q., Dai Z., Hovy E., Luong M. T., and Le Q. V. 2020. Unsupervised Data Augmentation for Consistency Training, In
Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS).
Xue D., Wu L., Hong Z., Guo S., Gao L., Wu Z., Zhong X., and Sun J. 2018. Deep Learning-based Personality Recognition from
Text Posts of Online Social Networks, Applied Intelligence 48(11): 4232–4246.
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., and Hovy, E. 2016. Hierarchical Attention Networks for Document Classification.
In Proc. of the 2016 Conference of the NAACL: Human Language Technologies, 1480–9.
Yu, J., and Markov, K. 2017. Deep Learning Based Personality Recognition from Facebook Status Updates. In Proceedings of the
8th IEEE International Conference on Awareness Science and Technology (iCAST), 383–87.
Zhou J., Huang J. X., Chen Q., Hu Q. V., Wang T., and He L. 2019. Deep Learning for Aspect-level Sentiment Classification:
Survey, Vision, and Challenges. IEEE Access (7): 78454-78483.
Zimbra, D., Abbasi, A., Zeng, D., and Chen, H. 2018. The State-of-the-art in Twitter Sentiment Analysis: A Review and Benchmark
Evaluation, ACM Transactions on Management Information Systems (TMIS), 9(2), 1-29.

48

View publication stats

You might also like