Predicting Personality From Facebook Data
Predicting Personality From Facebook Data
research-article20212021
SGOXXX10.1177/21582440211032156SAGE OpenBaşaran and Ejimogu
Original Research
SAGE Open
Abstract
Everyday, social media usage particularly Facebook usage are growing exponentially. Simply, inspecting Facebook usage
provides meaningful information concerning users’ daily interactions and hence about their personality traits. Numerous
studies have been done to harness such streams of Facebook data to obtain accurate prediction of human behavior, social
interactions, and personality. The aim of this study is to build a neural network–based predictive model that uses Facebook
user’s data and activity to predict the Big 5 personalities. This study combines the inference features highlighted in three
different relevant studies which are; number of likes, events, groups, tags, updates, network size, relationship status, age, and
gender. The study was conducted on 7,438 unique Facebook participants obtained from the myPersonality database. The
findings of this study showed how much a person’s personality can be predicted only by analyzing their Facebook activity.
The proposed artificial neural network model was able to correctly classify an individual’s personality at an 85% prediction
accuracy.
Keywords
artificial neural network, deep learning, Facebook, multi-label classification, personality prediction
Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License
(https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of
the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages
(https://us.sagepub.com/en-us/nam/open-access-at-sage).
2 SAGE Open
Different techniques have been applied so far in literature Kiritchenko, 2013). However, the topic of predicting person-
and numerous studies have shown that there is a certain link- ality on social media has become a popular one.
age between users and their Facebook profile. This associa- The back propagation (BP) algorithm for neural network
tion can be harnessed and be applied into different areas such was typically used in ANN studies but since the dataset to be
as targeted marketing, psychology and more (Golbeck et al., analyzed involves a multi-label classification problem, some
2011). important characteristics of multi-label learning are not cap-
Using Facebook data to determine a person’s personality tured with the basic BP algorithm, which does not consider
based upon the Big 5 personality model can be classified as a correlations of different labels. A modified BP algorithm is
“multi-label classification” (MLC) problem, in the sense that better suited for MLC problems used in this study.
an individual can possess more than one personality trait. In There are significant relationships between an individu-
the Big 5 model, personality of a person differs in terms of al’s personality and their Facebook activity, this is to say that
openness, conscientiousness, extraversion, agreeableness, based on a person’s Facebook activity one can get clues to a
and neuroticism (OCEAN; Costa & McCrae, 1992). Each of person’s personality (Sumner et al., 2011). This study inves-
these five personality traits all corresponds to a classifier. An tigates to show whether the similarities between an individu-
MLC problem is defined as where more than one target label al’s personality and their Facebook activity can be used to
is attached to each instance. This method is often applied to better predict personality more successfully.
tasks such as text categorization, medical diagnosis, music This study contributes to an expanding literature on infer-
categorization, and semantic scene classification (Tsoumakas ring personality with social media by using back propagation
& Ioannis, 2006). An individual can be categorized under feed forward algorithm to analyze the Facebook activity data
more than one personality label in a MLC problem. to see if better prediction results can be achieved. Upon the
Different techniques have been proposed to solve prob- completion of this research, there was no knowledge of any
lems such as those, some of which are Multi label K Nearest literature that uses neural network strictly together with
Neighbors (ML-KNN; M.-L. Zhang & Zhou, 2007), Artificial Facebook activity without looking at post and text to predict
Neural Network (ANN), Naïve Bayes, support vector personality. Also present studies use a small-scale data set for
machine (SVM) Decision Trees, and Logistic Regression analysis which might impede the reliability of their results.
(Hall, 2017). This research practically contributes to the field by inves-
ANN is a type of multi-dimensional regression analysis tigating the linkage between a user’s Facebook activity and
model, which makes it in various ways better than conven- their personality by using a neural network predictive model
tional regression models. The inspiration behind the develop- to analyze information from the users Facebook activity data
ment of ANN is stemmed on developing an intelligent system which will help us to understand the extent of such relation-
that can perform task intelligently like that of a human brain ship and to know if this can help better predict a user’s per-
(Devi et al., 2012). Regardless of how complex a system sonality more accurately. A model that can accurately predict
might be, ANN can accurately perform prediction. That is personality may help adaptive applications adjust to user
why a lot of researchers use ANN for prediction problems behavior accordingly. Many examples of ANN driven per-
especially in cases where the problem is too complicated to be sonalized advertising, customized education, and viewing
expressed in a mathematical formula and also in a case where posts on Facebook are already available.
the input/output data are available (Bataineh et al., 2016).
This study aims to use ANN to predict personality with
Related Research
the dataset derived from Facebook. The dataset retrieved
from myPersonality database (Kosinski et al., 2015) consists This study involves three important aspects: the study is a
of more than 3 million Facebook users. multi-label classification problem which uses artificial neu-
Some studies use linguistic behavior of a person from a ral network to predict Big 5 personality traits of users from
person’s status update to predict personality (Tandera et al., Facebook data. Therefore, the literature was divided into the
2017), but this research sought to predict personality by ana- following sections accordingly as Big 5 personality, multi-
lyzing and utilizing the relationship between a user’s person- label classification, and artificial neural network and its use
ality and their Facebook activities. in prediction.
The personality of an individual is stable through time
and situation (Espinosa & Rodríguez, 2004), meaning per-
Big 5 Personality
sonality of an individual does not change online or offline, an
individual that is sociable offline will be sociable online. There are five major characteristics that define human per-
Therefore, the Facebook profile of an individual can reflect sonality known as “Big 5,” this is a well experimented and
actual personality (Back et al., 2010). scrutinized structured for individual personality used by
There are some studies in literature that predicts Big 5 researchers recently (Goldberg, 1992). The Big 5 personal-
personality utilizing features such as linguistic which is ity traits are classified as openness, conscientiousness,
retrieved from written text or speech text (Mohammad & extroversion, agreeableness, and neuroticism as shown in
Başaran and Ejimogu 3
Table 1. Over the years, this Big 5 model has become a stan- of nodes and each node is referred to as neurons (M.-L.
dard for personality due to the fact that it came out of prior Zhang & Zhou, 2006).
test on personality, and the test also showed that the models ANN has the ability to learn, to adapt by modifying its
validity was not altered by languages or variation in method internal structure depending on the data that passes through
analysis (McCrae & John, 1992), therefore resulting in its it. It is one of the most successful learning methods and has
acceptance. When dealing with the Big 5 personality model, performed so well in classification (J. Zhang, 2016). ANN
each individual can highly exhibit some of these traits provides variations of techniques to learn from examples and
together therefore meaning that the personality traits are not performs very well in pattern recognition. At the moment
contrasting to each other. A person can exhibit high symp- various types of neural networks exist as examples: mapping
toms of Agreeableness, Openness, while exhibiting little networks, radial basis function networks, adaptive resonance
symptoms of Neuroticism. For instance, a person who is an theory models and of course multi-layer feed forward neural
extrovert in real life always tends to post a lot about their networks (Kalghatgi et al., 2015).
activities and share their experiences while a person who is Numerous studies have been carried out in the past using
neurotic often tends to be less active and have less tags due ANN as a tool, only the studies conducted using ANN for
to their shy nature. prediction were examined, then after that studies carried out
in the area of ANN in prediction for multi-label classification
problems were examined and then finally studies relating to
Multi-Label Classification
ANN in personality prediction.
In machine learning, multi-label classification (MLC) is a
form of classification problem but varies differently from
Using ANN for Prediction
other classification problems, in the sense that each sample
can have several labels (Tsoumakas & Ioannis, 2006). Different models and methods have been proposed for pre-
This varies from other classification problem that can diction of various outcomes. In 2010, ANN was used as a
have just one label and never two (i.e., an object can either be tool to predict team performance by analyzing individual
classified as dog or cat but never both) and this is known as past achievements and history (Hedberg et al., 2010). The
Multi-Class Classification. In MLC, samples are attempted aim of the study was to provide a means by which employers
to be classified in more than one label (i.e., a person can be can analyze prospective team member’s track record to
both labeled as openness and agreeableness; Tsoumakas & understand the effect of that individual in the team. After
Ioannis, 2006). analysis, training, testing, and evaluation, the model achieved
Algorithm Adaptation uses certain algorithm to directly 73.4% prediction accuracy. With this level of accuracy, the
alter and classify standard classification technique to per- study claims that this ANN approach can be applied in other
form MLC. This schema treats MLC as a single integrated organizational levels including recruitment which also forms
problem without requiring problem transformation. Some a basis for predicting personality as well.
examples of machine learning methods that have adapted Champa and AnandaKumar (2010) conducted a study on
this approach in handling MLC are ANN, boosting, decision human behavior prediction through handwriting analysis.
trees, and KNN (Hall, 2017). For the case of the Big 5 per- The study uses ANN to analyze various samples of individ-
sonality traits which are independent of one another, an ual handwriting by looking at the baseline, the pen pressure,
individual can exhibit high symptoms of more than one per- and the letter “t.” The study states that professional hand-
sonality trait hence making it a multi-label learning task and writing examiners can understand human personality from
among other approaches, ANN approach presumed to yield individual’s handwriting; however, the process is costly and
better results in predicting personality accurately. prone to fatigue. The baseline, the pen pressure, and the
height of the t-bar in the letter “t” stem were fed into the
ANN as inputs and outputs as individual personality traits.
Artificial Neural Network The model was run through various epochs and hidden layer
ANN are designed to act like the biological nervous sys- and attained a maximum accuracy of 53%.
tems work in interacting with objects of the real world, Another study by Nkoana (2011) proposes an ANN model
they are a large parallel interconnected networks made up for flood prediction and early warning, in the study various
4 SAGE Open
number of trained neural network architectures were evalu- entropy, and other features such as ReLU activation function
ated using their mean percentage accuracy. The study imple- was used together with AdaGrad optimizers. The study
mented 14 neural networks using daily rainfall as the claims that this approach enables the model converge in just
predictive variable from the period of 1995 to 2009, after a few steps and the dropouts utilized helps prevent overfit-
examining the performance of the neural networks the Elman ting. The study evaluates the performance of the proposed
recurrent neural network with two hidden layers and two hid- model with other baseline models. The algorithm trains with
den nodes yielded a better result of 58% accuracy. The study a higher convergence speed due to the ReLU activation, the
claims that using ANN with daily rainfall can be used to pre- model also uses dropout to prevent overfitting by randomly
dict floods. Another study by Devi et al., 2012 also proposes dropping individual hidden units while by taking advantage
an ANN model for Weather prediction. The study collects of label space inherent correlation to minimize rank loss.
data from atmospheric pressure, temperature, wind speed, Liu and Chen (2015) proposed a multi-label approach
wind direction, humidity and precipitation and uses it to train for sentiment analysis of microblogs. The study compares
a three-layer ANN. The results were compared with practical 11 state-of-the art ML classification methods and uses eight
working of the meteorological department and the study metrics for evaluation. The comparison was carried out on
claims to have built a model which can successful predict two microblog datasets. Out of the 11 methods evaluated,
weather based on the comparison results. some of the methods performed better than others depend-
Another interesting study using ANN for future predic- ing on the scenario. Rakel (Random K label set) performs
tions was by Song and Kim (2014), the study feeds the Big 5 better with HR, while other algorithms performed better on
personality trait as input into the ANN model to predict indi- AI. So, the different features in the results affected the
viduals’ future location. That study explores the connection results of the study but the result of the study shows that
between human mobility patterns and their personality to one of the dictionaries used in the study Dalian University
train the ANN to predict future locations. The study com- of Technology Sentiment Dictionary performs best on
bined time information and personality as input nodes while multi-label classification.
locations as output sample training data. The researcher Corani and Scanagatta (2016) proposed a multi-label
claim to predict human location through the help of the per- classifier model which is based on Bayesian networks but
sonality trait properly. The study recommends to use the performs slightly different from the baseline Bayesian net-
reverse of this model in the future to use mobility pattern to work. The model addresses the dependencies among the
predict personality. class variable which is normally overlooked when devising
Binh and Duy (2017) used ANN as a tool to predict stu- independent classifier for each of the classes to be predicted.
dent performance based on the students learning style. The The model works by simultaneously predicting the class
study conducted an online survey with a participation of variable which is different from the baseline approach, the
316 undergraduate students in various courses. Using the study results show that the performance of the proposed
data collected and analyzed an ANN model was built to pre- model out performs the independent approach when predict-
dict students’ performance based on their learning style. The ing multiple air pollutions.
ANN model managed to produce 80.63% classification Kee et al. (2017) proposed a neural network multi-label
accuracy, the study claims that this can method can be applied classification system to predict the arrival time of bus trans-
in e-learning environment adaptive models that can support port. The neural network is built based on the historical GPS
learners. (Global Positions System) arrival time and ensemble of neu-
Al-Shihi et al. (2018) proposed a model that can be used ral network is used to improve the reliability of the output.
to predict mobile learning adoption in developing countries. The results of the study show that the proposed model is able
The study integrates some constructs such as social learning, to forecast the arrival time up to a reasonable percentage of
flexibility learning, enjoyment learning and economic learn- 75%. The neural network and ensemble model was com-
ing. The study was conducted on 388 participants from major pared with other algorithms such as decision tree, Random
universities/colleges at Oman and ANN was used as the tool forest, Naïve Bayes, and the model proves to be 8% better
for prediction. The study suggests that proposed model can than the other algorithm. The study suggests further improve-
be used to predict and influence mobile learning adoption. ment of the model by using power transformation and some
other different ensemble methods.
Using ANN for Prediction Through Social Media
Personality Prediction Through Social Media
Nam et al. (2014) proposed a simpler ANN approach to han-
dle multi-label classification in large-scale multi-label text Wald et al. (2012) proposed a form of machine learning
classification. The proposed method is aimed at being an ensemble learning called SelectRUSBoost to predict psy-
alternative and better method than the state-of-the art back chopathy through Twitter data. This method adds feature
propagation multi-label learning approach. In the study, the selection an imbalance aware ensemble to tackle high dimen-
BP-MLC’s pairwise ranking loss was replaced with cross sionality. The study states that when ensemble learning, data
Başaran and Ejimogu 5
sampling and feature selection in SelectRUSBoost, the just the highest occurring class which is used for compari-
model is able to hit AUC (Area under the curve) of 0.736 and son. The model was trained and validated with a split of 80,
this performance is only achieved when this model is used. 10, and 10 for training, testing and validation.
The study states that a model such as this can be used by law In 2017, Tandera et al. (2017) carried out a competitive
enforcement in discovering psychopathic cases through their analysis of current deep learning architecture and uses accu-
Twitter data. The study also states that though the model can racy results to compare performance. The study involved
be used with Twitter to predict the incidence of psychopathic using the models to predict Big 5 personality trait from data
situations. They are not sufficient to provoke direct actions retrieved from users Facebook account. The dataset used in
but can be used to flag potential risk. the study were gotten from two different sources; myPerson-
Farnadi et al. (2013) explored the use of machine learn- ality dataset consisting of 250 users and then 150 Facebook
ing (SVM, NB [Naïve Bayesian method], KNN) to infer users data which were collected manually. The study also
personality just by examining Facebook status updates of uses linguistic features such as LIWC with both closed and
various users. The study strengthens their prediction model open vocabulary approach. The study reports saying the
by not just relying on one source but by including different model outperforms other methods by 74.14% average accu-
training samples from another source (Essay corpus) help- racy, though accuracy was low with some traits, study claims
ing the study show that trait can be generalized across this could be a result of limited dataset. The experiment
social media platforms. The study investigates 250 users results show ANN doing better than other traditional machine
with 9,917 status updates and states that despite having a learning classification method.
small amount of dataset the model could still outperform Again in 2017, Laleh and Shahram proposes a model that
other baseline methods. uses LASSO algorithm to select the best features and predict
Another study by Kandias et al. (2013) proposes a meth- the Big 5 personality trait from a user’s Facebook data by
odology that detects users that are hostile or with a negative examining Facebook likes. The study examines the likes of
attitude toward the authorities, the study combines the dic- 92,225 users while combining with 600 weighted topics, the
tionary learning-based approach and machine learning tech- model also examines the task as a regression problem. The
niques (SVM, NB, LR [logistic regression]). The study training and test data are split 75% and 25%. The cross vali-
analyzed information posted on the YouTube website. dation method was used to validate the model. Still in 2017
Lima and de Castro (2014) in their study use a semi super- is a study by Iatan which uses Fuzzy Gausian Neural Network
vised classification approach to predict personality through (FGNN) to predict personality from a user’s Facebook
twitter data. The data take a different approach from other account based on the data publicly available and compares
studies, this study does not take user profile into consider- result with two other models; multiple linear regression
ation, and it does not work with single texts like in other model and multi-layer perceptron. The performance of the
studies but works with a group of text. The study uses the model was tested using normalized root mean square. The
problem transformation method to transform the problem study results show how the proposed method outperforms
into five binary classification problems. The study used three the other two methods during training.
well-established machine learning algorithm: NB, MLP
(multi-layer perceptron), and SVM to train the proposed sys-
Method
tem and was applied to predict personality from tweets which
resulted in an 83% prediction accuracy. There is a relationship between an individual’s Facebook
Kalghatgi et al. (2015) also investigated Big 5 personality profile and their personality; there are some predictive mod-
trait prediction through analyzing tweeter data with ANN. els that take into account the Facebook activities of users and
The study explores the parallelism between an individual’s their networks (Bachrach et al., 2012). The study attributes
linguistic information and their Big 5 personality trait and such as number of likes, groups, tags, and friendship net-
uses the tweets posted by an individual to predict personality. works were the features focused on. Another study (Kosinski
The study also reports that the model doesn’t take user et al., 2013) also proposed a predictive model that just
tweeter profile into consideration and implements it in java focuses on the demographic information retrieved from the
NetBeans using Hadoop framework to make predictions of users’ Facebook profile and demographics such as age, gen-
multiple individuals at the same time. der, and relationship status. All these previous works show
Akshat (2016) investigates using CNN to predict person- tight relationship between the users Facebook profile regard-
ality from social media images, the study sort to find out if ing their usage pattern and their demographics. Based on
there was any relationship between the output why such rela- this, in this study, a different dataset is used and a combina-
tionship exist. The study results show how powerful neural tion of two different predictive models from previous work is
network is as a tool to measuring and learning highly nonlin- used to formulate the predictive model for this study. The
ear mappings between input data and output data. The study model used was built on the findings of earlier three studies.
uses the transformation method to transform the task into a The model is a combination of the features highlighted
classification task and uses a chance baseline which guesses by Bachrach et al. (2012) and Sumner et al. (2011) with the
6 SAGE Open
features highlighted by Kosinski et al. (2013) in their stud- The dataset were downloaded from the myPersonality data-
ies which are number of likes (total number of like button base and needed to be merged together into one file, Microsoft
pressed), number of status updates(the total number of SQL was used to merge the various database by their unique
Facebook status updates occurred), number of events(the user id. The script used to merge the various database can be
total number of Facebook events joined or created), num- found in the Appendix section.
ber of groups (number of Facebook groups joined), num- The different database did not contain equal amount of
ber of tags (total number of Facebook post tags used), participants, so to merge them, only common participant that
network size(total number of friends), relationship could be found in all three Databases were merged, others
status(the status showing the relationship), age(age of the that could not be found in the other databases were dropped.
user), and gender(gender of the user). The features with After merging the data, the dataset was left with 1,337,313
potential high influence on personality prediction were rows of unique participants with unique user IDs.
chosen for this study. After various database files were successfully merged,
There are some steps that need to be taken when creating missing values were considered. Missing data can greatly
an algorithmic model. The first step and most crucial is the affect the results of a research because it could lead to biases,
pre-processing of the data. This step is what prepares the data affects the findings, generalizability, and result to a great loss
for the task to be carried out. Feeding data that have not been of information (Dong & Peng, 2013). To ensure no missing
properly processed can greatly hamper the results of the values in the data, missing value analysis was done using
model and can throw off the model completely. Before clas- IBM SPSS, and it was observed that the merged file had a lot
sifying or feeding the data into the ANN network first pro- of missing values and before the database can be used for the
cessing must be done. The next step is the actual processing study, missing values must be addressed. Some of the tables
itself which involves transforming the data received and then in the dataset had missing values above 50%, so two methods
finally feeding it into the ANN for classification. Figure 1 had to be used to deal with the missing data, listwise dele-
shows the flow diagram of the model used in this study. tion, and replacing using series mean (Humphries, 2013).
The Dataset used for this study was obtained from the Another script was written in MSSQL to handle missing val-
database provided by the myPersonality project (Kosinski ues. Starting from the column with the highest missing, the
et al., 2015) which consists of Facebook data of more than 4 script compares the values in the column with other columns
million participants with given personality label which are and deletes that row if a missing value is found; this step was
based on the Big 5 personality model. The myPersonality repeated across the columns until missing value was below
project was initiated by David Stillwell and Michal Kosinski. 10%, eventually reducing the data to 7,438 participants.
It is a Facebook application that collects users Facebook After the missing values were reduced to 10% or less, the
information from their Facebook profile while taking pri- replace missing value option using series mean in SPSS was
vacy issues into consideration and also allows them take psy- used to replace the remaining of the missing values.
chometric tests which calculates things such as satisfaction Now dataset only comprises participants with no missing
with life and Big 5 personality. The data retrieved from the data, the data could now be further processed to be used in
application were processed, analyzed, and then used to create the neural network model (See Figure A1).
the datasets. The data contain information of user’s demo- Out of the 7,438, 3,013(40.5%) were male and
graphics, activities, and friendship network size. During this 4,425(59.5%) were female. The majority (57.4%) of the par-
study, the following datasets were downloaded. ticipants were in the age group of 18 to 25 years, followed by
those (28.1%) within the age group of 26–40 years, then the
•• Big 5 model personality score: These data contained 7.4% between 18 years and below, the 6.4% within 40–60
Big 5 personality test scores taken by 3,137,694 years, and also 0.7% those with 60 years and above. The
Facebook users. It contained scores for the main Big 5 Dataset also shows Big 5 personality traits of the partici-
traits Openness, Neuroticism, Agreeableness, pants; 96% exhibited openness traits and 4% did not, 57%
Conscientiousness, and Extraversion which results exhibit traits of neuroticism and 43% did not, 91% exhibited
were scaled from 0 to 5. agreeableness traits and 9% did not, 87% exhibited conscien-
•• Facebook activity: These data contained a summary tiousness traits but 13% did not and finally 88% exhibited
of the activities (tagging, posting, joining groups etc.) extrovert traits and 12% did not. Table 2 represents the actual
Başaran and Ejimogu 7
Table 2. Big 5 Personality Distribution. the dataset has been successfully normalized and trans-
formed it can be sent further for classification which outputs
Value OPE NEU AGR CON EXT
will return either 0 or 1 (See Figure A2).
Yes 7,181 4,209 6,789 6,502 6,567 Various methods have been used for classification in the
No 257 3,229 649 936 871 past but in this study a multi-label back propagation neural
network technique was utilized for classification. ANN has
been widely adopted into multi-label classification. The
number of participants who displayed the Big 5 personality configuration and parameter in an ANN model needs to be
traits after preprocessing the dataset. selected efficiently so as to ensure appropriate generaliza-
When feeding the data into the neural network, the data to tion and efficient learning. What the model does is that
be fed should be in tensors of floating point data (Chollet, through a feed forward process, updates itself by the back
2017). The data also must not take widely different ranges propagation update method and uses the supervised topol-
because it could affect training. ogy to enhance the model. This method is a multi-purpose
The values in the dataset are all in different scales and in learning algorithm, very effective and produces great results
order for the neural network to properly work with the data, but it also costly in terms of learning requirement. The hid-
the data had to be rescaled to become uniform. Since the neu- den layer is the layer between the first and the last output
ral network is to be built using Python, the rescaling of the layers. The data are taken from each of the input neurons
data was done using Python also. In Python, data processing through the synapses and multiplies it with a set of random
and neural network has been simplified with the help of weights. The summed weighted inputs are then passed
TensorFlow by Google, every operations was carried out through an activation function to the output layer. In this
with TensorFlow as the backend. study, two activation functions were used ReLU and sig-
Steps for rescaling were given in the following: moid activation function. The first to be used is the ReLU
activation function. ReLU is the most used activation func-
1. The data were loaded using Panda’s module in tion in the world today when sending signals from the first
TensorFlow, after the data were loaded, the input val- layer to the next layer before the output layer (Sharma,
ues and output values were defined. Input variables 2017). Also since this is a multi-label classification problem
were 9 (number of likes, relationship status, number and each label prediction probability needs to be predicted
of status updates, number of events, gender, age, net- independently of the other class probabilities, sigmoid acti-
work size, number of groups, number of tags and out- vate function is used as the second activation function
put variables were 5 personality traits(Openness,
Conscientiousness, Extroversion, Agreeableness and ( A ( x ) = max ( 0,1) )
Neuroticism).
1
2. To normalize the data, some of the columns (relation- A =
ship status, gender) had to be rescaled because they 1 − e− x
are nominal data. Using “One hot encoder ” the col-
An ANN task with multiple possible label samples that
umn was split into ten and scaled into binary, which
are not mutually exclusive meaning that a sample can have
increased the input variable to 18.
multiple label and not restricted to just one label, this is
3. The remaining of the data were normalized using the
known as a multi-label classification problem. This problem
MinMax Scale (0–1) in python. This was chosen
is well tackled in ANN with a framework known as keras. In
because it helps in feed forward back propagation
this study, we have a problem with five different labels
during gradient descent calculation.
(openness, agreeableness, neuroticism, conscientiousness
4. Since our problem is a multi-label classification
and extraversion); therefore, this study has n samples
problem the output data (y) was further rescaled
into binary of 0 and 1 with the binaries function pro- X = { x1 , …, xn }
vided by scikit in Python. Values greater than 0.5
were classified as 1, while values less than 0.5 were and n number of labels
classified as 0. y = { y1 , …, yn }
To ensure this is not the case, the best approach will be to with y_i ∈ {1,2,3,4,5} and P(c_j—x_i) for the prediction
normalize the data by transforming them into vectors of −1 probability. The next thing is to build a simple ANN with
to 1 or 0 to 1. Some of the data also might be scaled from 1 five output nodes and one output for each class. Designing
to 10. If there are 10 variables, it takes the first variable and the input and hidden layers is quiet straight forward but
turns it into 1 and then turns the rest into 0’s, it then takes the designing the output layer for a multi-label and choosing
second integer, turns it into 1 and then turns the rest into 0’s what kind of layer it will be is quite important. Usually the
it continues this process for all the remaining integer. After softmax layer is the choice for multi-classification problem
8 SAGE Open
this looks good the remaining 20% of data, which has not
Input Layer been exposed to the network, is used to test the accuracy of
Input 1
Output Layer the model.
There are two common approaches used to evaluate the
Openness performance of a classification model that are K-fold and
Input 2
leave one out validation (Wong, 2015). In this study, the
Neurocism
K-fold cross validation was used for this study to conduct
Input 3 the evaluation of the model.
Input 4 Agreeableness
Results
Extraversion To carry out the data processing and modeling, python was
used due to the vast amount of machine learning libraries
available in the python language and the incredible data
Input 18 Conscienousness
visualization that can be carried out also in the python
environment.
In the building of the neural network it is important to
identify the number of hidden layers and hidden neurons to
be used in the network. Depending on the data it is safe to
Figure 2. Neural network model. start with a few hidden layers preferably one fully connected
hidden layer (Chollet, 2017).
The model was setup with a fully connected hidden layer
but this is not really the best choice for a multi-label problem
between the input layer and the output layer. The ReLU was
(Sterbak, 2017). In softmax when increasing score for one
used to act as the non-saturating activation function between
labels all others are lowered (probability distribution) which
the first layer and the hidden layer while the sigmoid activa-
is not a problem when predicting a single label per sample
tion function was used as the final output activation function.
but in a multiple label prediction this is not good.
When deciding for the number of hidden neurons it could be
What is needed is to decompose the multi-label classifica-
somewhat complicated to select the best number of hidden
tion task, for this, a sigmoid output layer is need consisting of
neurons perfect for the task without examining several mod-
a sigmoid activation function and binary_crossentropy loss
els. Inadequate or too much hidden neurons could lead to
function. The labels will be improved individually and each
over fitting or underfitting.
label is independent of the other labels probability. There are
In neural network, a lot of try error is usually done to
18 inputs and 5 outputs as shown by Figure 2.
ascertain the best parameters for the network. There is some
1 rule of thumbs when deciding the number of hidden neurons
( )
P c j | xi =
1 + exp ( − zk ) (Heaton, 2008).
The study employed the use of the Keras API developed •• It should be less than two times the input layer size
my Google which runs on TensorFlow to build the ANN •• Two-third of the sum of input and output neurons
model. This API allows models to be built easily and pro-
vides easy manipulation of data for better learning (Maxwell In this study, different amount of hidden neurons were
et al., 2017). Due to the flexibility of TensorFlow, developers tried and then increased to ascertain the neurons with best
can easily experiment on various optimization techniques performance.
and algorithms, which helps in simplifying implementation. First, the data were imported using the panda’s data
This model was built using Python. Python has become one frame because of the massive functionality the panda’s data
of the most popular languages for data science (Chollet, frame gives to work on data. After data have been success-
2017), it is a language that is very comprehendible by a wide fully imported, the next step is to create a matrix of the
range of people and it is easy to read syntax. features and target variable, and this enables the network to
In classification techniques, the dataset are separated into identify the input and the output file, 9 input variables and
training, testing and validation data so as to be able to deter- 5 output variables. As discussed in the previous section,
mine and monitor the accuracy of the model. The training features such as relationship status were coded from 1 to 10
data will consist of 60% of the data; these data are examples but leaving it this way would mean 10 is higher than 1,
by which the network can use to learn patterns and variations meaning single is higher than divorced which is not the
to make its decisions these data will be fed to the training case. To handle these, dummy variables were created using
model repeatedly. While the training is going on, another a function from the SciKitLearn library in python known
20% of the data are used to validate the quality of training, if as OneHotEncoder (Pedregosa et al., 2011). After one hot
Başaran and Ejimogu 9
Testing
The model performs quite well with the test data and shows
good generalizability. This is to say that the model gives
compelling generalization abilities when presented with new
users Facebook Activity Data. Table 2 shows the prediction
accuracy results and the hamming loss for all the trained net-
works. The hamming loss is part of the metrics used for eval-
uation in this study. It is used to compute accuracy through
the equilibrium contrast between the target data and pre-
dicted data. The hamming loss is the fraction of labels that
are incorrectly predicted. The best hamming loss for the
model is 14.96%, which implies that in more than 85% of the
time, the model can correctly classify an individual based on
his/her Facebook activity. The Table A2 represents scheme 1
and scheme 2 testing results as the prediction accuracy
Figure 3. The study flowchart. results and the hamming loss for all the trained networks.
The learning curve showing the convergence of the network
plots the loss against the epochs and the accuracy against the
encoding, the transformation and normalization were also
epochs is shown in Figure A3 to A6 representing Scheme 1
done using the SciKit-Learn library. The dataset is trans-
Test 1, Scheme 1 Test 2, Scheme 2 Test 1 and Scheme 2 Test
formed into vectorized form from 0 to 1 so as to enable the
2, respectively.
network better understand the data for classification. The
output target dataset was also transformed using 0.5 as its
threshold. The SciKit-Learn is also used to split the data set Discussion
into training and test samples.
Different studies have been carried out on the subject of
The next step is the training and testing. This was broken
using various means to infer personality. While some studies
down into two different schemes: the first scheme involved
have used real-life activities pertaining to locations and
manually splitting the dataset and the second scheme
speech to infer personality, a study by Golbeck et al. (2011)
involved using the K-Fold cross validation. The study’s pro-
took a different approach and focuses on using social media
cedures were represented in Figure 3.
to infer personality and discovered a tight relationship
between a user’s profile and personality. Some other studies
Training
by Bachrach et al. (2012) and Kosinski et al. (2013) shows
In the first scheme, which was split manually, the first part of some major inferences between demographic attributes and
the first scheme was split into 75% for training and 25% for activities to a person’s personality.
testing, while the second part was split 67% for training and Above studies provided background motivations to carry
33% for testing which the model has not seen before. The out this study. The influence of social media and especially
second scheme, however, was done using the K-fold cross Facebook in the society is rapidly on the increase especially
validation, this was also broken into parts which was K-10 during the times of covid-19 pandemic. Facebook has
Cross validation and K-5 Cross validation become an integral part of our lives, businesses, government
In the final process of the BPNN model, the input neuron and many more. It is important to understand what extent of
was made up of 18 input neurons to accommodate for the 18 inference Facebook has to a person’s personality so that this
features. For the five personality classes; openness, agree- can be used to better improve and safe guard lives and lead
ableness, neuroticism, extraversion and conscientiousness; to better society. In this Study 2, inference models from
the output layer was made up of five output neurons. For the three different studies were combine, this being the features
sake of optimization, different parameters were set, such as highlighted by Bachrach et al. (2012) and Sumner et al.
10 SAGE Open
(2011) with the features highlighted by Kosinski et al. that ANN with proper parameter tuning could perform well
(2013). The developed framework shows its capacity to pre- accuracy on complex multi-label task such as personality
dict the Big 5 personality traits; Openness. Agreeableness, classification when trained and tested with new data. With
Conscientiousness, Extraversion and Neuroticism from a the rapid growth in demand among various companies in bet-
user’s Facebook data. As shown in this study, this ANN ter understanding their clients, this has increased the demand
model shows encouraging results, in that in 85% of the time for online tools that can help better under the personality of
the network will correctly classify a Facebook user based on the consumers.
just their activities. Upon trying different methods the best One of the limitations of this study is that a huge amount
classification result derived from the model had a prediction of data were lost during data pre-processing but more data
accuracy of 85.04% although their differences were not so can be added to the model improve training phase. To
distinct. Comparing the results of this study with some other improve the model training quality, there is a need for more
studies such Tandera et al. (2017) and Lima and de Castro data, so much data can be gotten from an individual’s social
(2014) networks, this model performs better. In the study by media account. Another limitation to this study is that accu-
Tandera et al. (2017) which used linguistic features on racy was not verified with other methods such as partial least
Facebook data to infer personality was able to hit 70% accu- squares and other machine learning methods. This similar
racy on their neural network model, while in the study by study should be carried out on the same participants other
Lima and de Castro (2014), a semi supervised learning accounts so as to better compare results and improve predic-
approach was used and the outputs where broken down and tion. Finally, more studies should be carried out in this area
analyzed separately into five different binary outputs, this of utilizing neural network to better understand and predict
method gave a 75% prediction accuracy. In this study, the personality so as to understand ways to make people’s lives
proposed models analyzing just Facebook activities and better. With the prediction accuracy improved more, this
demographics alone was able to perform better with a predic- model can be implemented in Facebook, users will no longer
tion accuracy of 85%. This shows how the ANN model can need to fill long personal forms to be able to determine their
be used to learn accurately and faster during the training personality type, the personality type can be determined on
phase with if given more data; however, the generalizability Facebook just from user’s activities without having to fill
is weakened in different scenarios which maybe a result of any forms. Users can be able to make results public and share
the data or the parameters used during the training. Given on their wall. In business, based on the requirement of com-
suitable pre-processing and adequate amount of dataset, this pany, organizations can be able to predict the personality of
present study evinces the viability of ANN models for per- workers to see how they can better improve their service.
sonality classification and it also shows the usefulness. Advertisers can better know how to target their audience, for
instance, advertisers can target people with openness person-
ality when advertising new products or target people with
Conclusion
neurotic personality when advertising security products. The
The purpose of this study was to explore the performance of data that can be retrieved from Facebook data are very rich,
ANN in classifying and predicting the big five personality further studies can be carried out in combining ANN with
based on the data derived from a user’s Facebook data. This Big 5 personality traits to analyze Facebook data to predict
study proffers an apt system classification model for Big 5 depression and suicidal tendencies. Future work will be
personality prediction that could accurately infer an individ- geared toward improving the accuracy of the model by col-
ual’s personality based on only their Facebook data with a lecting more data and cross validating the data with other
prediction accuracy of 85.04%. The observations showed social media platforms asides Facebook.
Başaran and Ejimogu 11
Appendix
Figure A1. Sample of input data before one hot encoding and data transformation.
Figure A2. Sample of input data after one hot encoding and data transformation.
12 SAGE Open
Proceedings of the 12th ACM workshop on workshop on Song, H. Y., & Kim, S. Y. (2014). Predicting human locations
privacy in the electronic society —WPES’13 (pp. 261–266). with Big Five personality and neural network. Journal of
Association for Computing Machinery Press. Economics, 2(4), 273–280.
Kee, C. Y., Wong, L.-P., Khader, A. T., & Hassan, F. H. (2017). Statista. (2020). Facebook users worldwide 2020. https://www.
Multi-label classification of estimated time of arrival with statista.com/statistics/264810/number-of-monthly-active-face-
ensemble neural networks in bus transportation network. In book-users-worldwide/
Proceedings of the 2017 2nd IEEE international conference on Sterbak, T. (2017). Guide to multi-class multi-label classification
intelligent transportation engineering (ICITE) (pp. 150–154). with neural networks in python: Depends on the definition.
Institute of Electrical and Electronics Engineers. https://www.depends-on-the-definition.com/guide-to-multi-
Kosinski, M., Matz, S. C., Gosling, S. D., Popov, V., & Stillwell, label-classification-with-neural-networks/
D. (2015). Facebook as a research tool for the social sciences: Sumner, C., Byers, A., & Shearing, M. (2011). Determining person-
Opportunities, challenges, ethical considerations, and practical ality traits & privacy concerns from Facebook activity. Black
guidelines. American Psychologist, 70(6), 543–556. https:// Hat Briefings, 11, 1–29. https://media.blackhat.com/bh-ad-11/
doi.org/10.1037/a0039210 Sumner/bh-ad-11-Sumner-Concerns_w_Facebook_WP.pdf
Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits Tandera, T., Hendro Suhartono, D., Wongso, R., & Prasetio, Y.
and attributes are predictable from digital records of human L. (2017). Personality prediction system from Facebook
behavior. Proceedings of the National Academy of Sciences of users. Procedia Computer Science, 116, 604–611. https://doi.
the United States of America, 110(15), 5802–5805. https://doi. org/10.1016/j.procs.2017.10.016
org/10.1073/pnas.1218772110 Tareaf, R. B., Alhosseini, S. A., Berger, P., Hennig, P., & Meinel,
Laleh, A., & Shahram, R. (2017). Analyzing Facebook activities C. (2019). Towards automatic personality prediction using
for personality recognition. In 2017 16th IEEE international Facebook likes metadata. In Proceedings of the 2019 IEEE
conference on machine learning and applications (ICMLA) 14th international conference on intelligent systems and
(pp. 960–964). Institute of Electrical and Electronics Engineers. knowledge engineering (ISKE) (pp. 715–719). Institute of
https://doi.org/10.1109/ICMLA.2017.00-29 Electrical and Electronics Engineers. https://doi.org/10.1109/
Lima, A. C. E. S., & de Castro, L. N. (2014). A multi-label, semi- iske47853.2019.9170375
supervised classification approach applied to personality pre- Tsoumakas, G., & Ioannis, K. (2006). Multi-label classification:
diction in social media. Neural Networks, 58, 122–130. https:// An overview. International Journal of Data Warehousing and
doi.org/10.1016/J.NEUNET Mining, 3(3), 1–13.
Liu, S. M., & Chen, J.-H. (2015). A multi-label classification based Wald, R., Khoshgoftaar, T. M., Napolitano, A., & Sumner, C.
approach for sentiment classification. Expert Systems with (2012). Using Twitter content to predict psychopathy. In
Applications, 42(3), 1083–1093. Proceedings of the 2012 11th international conference on
Maxwell, A., Li, R., Yang, B., Weng, H., Ou, A., Hong, H., & machine learning and applications (pp. 394–401). Institute of
Zhang, C. (2017). Deep learning architectures for multi- Electrical and Electronics Engineers.
label classification of intelligent health risk prediction. BMC Wilson, R. E., Gosling, S. D., & Graham, L. T. (2012). A review
Bioinformatics, 18(S14), Article 523. https://doi.org/10.1186/ of Facebook research in the social sciences. Perspectives on
s12859-017-1898-z Psychological Science, 7(3), 203–220.
McCrae, R. R., & John, O. P. (1992). An introduction to the Five- Wong, T.-T. (2015). Performance evaluation of classification algo-
Factor Model and its applications. Journal of Personality, rithms by k-fold and leave-one-out cross validation. Pattern
60(2), 175–215. https://doi.org/10.1111/j.1467-6494.1992. Recognition, 48(9), 2839–2846. https://doi.org/10.1016/J.
tb00970.x PATCOG.2015.03.009
Mohammad, S. M., & Kiritchenko, S. (2013). Using nuances of Xue, D., Wu, L., Hong, Z., Guo, S., Gao, L., Wu, Z., Zhong,
emotion to identify personality. In Proceedings of ICWSM X., & Sun, J. (2018). Deep learning-based personality rec-
(pp. 1–4). https://arxiv.org/abs/1309.6352 ognition from text posts of online social networks. Applied
Nam, J., Kim, J., Mencía, E. L., Gurevych, I., & Fürnkranz, J. Intelligence, 48, 4232–4246. https://doi.org/10.1007/s10489-
(2014). Large-scale multi-label text classification: Revisiting 018-1212-4
neural networks. In T. Calders, F. Esposito, E. Hüllermeier, Zhang, J. (2016). Deep learning for multi-label scene classifi-
& R. Meo (Eds.), Joint European conference on machine cation [Unpublished master’s dissertation]. University of
learning and knowledge discovery in databases (pp. 437–452). Adelaide.
Springer. Zhang, M.-L., & Zhou, Z.-H. (2006). Multilabel neural networks
Nkoana, R. (2011). Artificial neural network modelling of flood with applications to functional genomics and text categoriza-
prediction and early warning [Unpublished master’s disserta- tion. IEEE Transactions on Knowledge and Data Engineering,
tion], University of the Free State. 18(10), 1338–1351. https://doi.org/10.1109/TKDE.2006.162
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, Zhang, M.-L., & Zhou, Z.-H. (2007). ML-KNN: A lazy learning
B., Grisel, O., & Vanderplas, J. (2011). Scikit-learn: Machine approach to multi-label learning. Pattern Recognition, 40(7),
learning Python. Journal of Machine Learning Research, 12, 2038–2048. https://doi.org/10.1016/J.PATCOG.2006.12.019
2825–2830. http://www.jmlr.org/papers/v12/pedregosa11a. Zhu, Y. (2020). The prediction model of personality in social
html networks by using data mining deep learning algorithm and
Sharma, S. (2017). Activation functions: Neural networks— random walk model. The International Journal of Electrical
Towards data science. https://towardsdatascience.com/activa- Engineering & Education, 1–14. https://doi.org/10.1177/0020
tion-functions-neural-networks-1cbd9f8d91d6 720920936839