A Comparison of Physiological Signal Analysis Techniques and Classifiers
A Comparison of Physiological Signal Analysis Techniques and Classifiers
This work focuses on finding the most discriminatory or representative features that
allow to classify commercials according to negative, neutral and positive effectiveness
based on the Ace Score index. For this purpose, an experiment involving forty-seven
participants was carried out. In this experiment electroencephalography (EEG),
electrocardiography (ECG), Galvanic Skin Response (GSR) and respiration data were
acquired while subjects were watching a 30-min audiovisual content. This content was
composed by a submarine documentary and nine commercials (one of them the ad under
evaluation). After the signal pre-processing, four sets of features were extracted from the
physiological signals using different state-of-the-art metrics. These features computed in
time and frequency domains are the inputs to several basic and advanced classifiers. An
Edited by:
Jose Manuel Ferrandez, average of 89.76% of the instances was correctly classified according to the Ace Score
Universidad Politecnica de Cartagena, index. The best results were obtained by a classifier consisting of a combination between
Spain
AdaBoost and Random Forest with automatic selection of features. The selected features
Reviewed by:
Rong Pan, were those extracted from GSR and HRV signals. These results are promising in the
Arizona State University, USA audiovisual content evaluation field by means of physiological signal processing.
Antonio Fernández-Caballero,
Universidad de Castilla-La Mancha, Keywords: audiovisual content evaluation, effectiveness, electroencephalography (EEG), electrocardiography
Spain (ECG), galvanic skin response (GSR), respiration, feature extraction, advanced classifiers
*Correspondence:
Adrián Colomer Granero
[email protected] 1. INTRODUCTION
To follow the objective approach, different features of TABLE 1 | Commercials involved in this study and grouped taking into
either positive or negative emotions can be extracted from account the Ace Score index.
The procedure of the experimental task consisted in observing (Figure 3E). This sensor is plugged into the auxiliary channels of
a 30-min documentary about the submarine world in which three the amplifier.
blocks of Super Bowl ads were inserted: the first one after 7 It is possible to plug a rubber band consisting of two electrodes
min from the beginning of the documentary, the second one into one of the eight auxiliary channels in order to measure the
in the middle and the last one at the end of the trial. Each breathing (Figure 3C). This rubber band is placed on the bottom
of these blocks was formed by three commercials (Figure 1). of the rib cage (Figure 3F). The sensors measure the rubber
This audiovisual content were randomly distributed to remove band deformation produced by the inhalation and exhalation
the factor “sequence” as possible confounding effect in the later phenomena.
analysis.
Two hours after the experiment, users were interviewed using 2.3. Signal Preprocessing
an online test. In this test different frames of proposed ads 2.3.1. Cerebral Signal
were presented. The user must connect the frames presented The baseline of EEG traces is removed by mean subtraction and
with the correct ad brands. The purpose of this interview was the output dataset is band pass (0.5–40 Hz) filtered. Then, the
to know which ads were remembered and forgotten by the corrupted data channels are rejected and interpolated from the
subjects. neighboring electrodes. A corrupted data channel is identified
computing the fourth standardized moment (kurtosis) along the
signal of each electrode. The kurtosis is defined as:
2.2. Signal Recording
2.2.1. Cerebral Recording µ4 E[(x − µ)4 ]
The cerebral activity was recorded using an instrument developed K(x) = 4
= (1)
σ E[(x − µ)2 ]2
by Twente Medical Systems International (TMSI from Oldenzaal,
The Netherlands). This device consists in an amplification where µ4 is the fourth moment about the mean, σ is the standard
and a digitalization stage. The amplifier (model REFA 40- deviation and E[x] is the expected value of the signal x. Moreover,
channels, Figure 2A) is composed by 32 unipolar, 4 bipolar, 4 a channel is also classified as corrupted if the registered EEG
auxiliary and 8 digital inputs. The TMSI instrument allows the signal is flatter than 10% of the total duration of the experiment.
synchronization, via hardware, from its inputs. Reference events are integrated into the data structure in order
All subjects were comfortably seated on a reclining chair to segment the EEG signal in epochs of one second. The intra-
60 centimeters away from the screen. The screen used was channel kurtosis level of each epoch is computed in order to reject
a 23 inches Full HD resolution (1920 × 1080 pixels). EEG the epochs highly damaged by the noise.
activity was collected at a sampling rate of 256 Hz while In the next step, Independent Component Analysis
impedances kept below 5k. For the experiment, we used (ICA) (Hyvärinen and Oja, 2000) is applied by means of
thirty electrodes (Figure 2B) and a bracelet ground located runica algorithm to detect and remove components due
on the opposite wrist to the habitual subject hand. The to eye movements, blinks and muscular artifacts. Thirty
montage followed the International 10–20 system (Jasper, 1958; source signals are obtained (one per electrode). Then,
Figure 2C). an automatic and embedded Matlab method (ADJUST)
(Mognon et al., 2011) is used to discriminate the artifact
2.2.2. Autonomic Recordings components from EEG signals by combining stereotyped
Using the TMSI instrument and software solution for artifact-specific spatial and temporal features. Components
neuroscience experiments (Neurolab from Bitbrain, Spain) whose features accomplish certain criteria are marked to reject
it is possible to acquire synchronized biosignals according to (Figure 4A). See Mognon et al. (2011) for detailed explanation
the audiovisual content under evaluation. By means of two of ADJUST. In Figure 4B the spacial and temporal features
bipolar inputs, the cardiac activity of each participant can be extracted by ADJUST algorithm of a typical eye blink can
registered. Two disposable electrodes (Figure 3A) are placed be seen.
on the upper chest. The first one, the electrode plugged into In the automatic process of artifact component identification,
positive terminal of the amplifier, is placed below of the right ADJUST presents several true negatives, in other words, there
clavicle and the other one, the electrode plugged into negative exists components which are composed by a lot of physiological
terminal of the amplifier, is placed below of the left clavicle noise and a little useful information (brain activity) that the
(Figure 3D). algorithm does not mark to be rejected. For this reason, a trained
To measure the skin ability to transmit electrical currents expert analyses manually the features of each component (the
incremented due to sweating and organism changes, a galvanic topographic distribution of the signal, the frequency response,
response sensor is used. This sensor consists in two cloth strips the temporal and spatial features extracted by ADJUST, etc.)
(with velcro) in which there is an electrode sewn (Figure 3B). in order to discover the remained artifact components. The
The strips are placed on the fingers of the non-dominant hand. final objective in the preprocessing stage is to guarantee a
Specifically, the electrode belonging to the positive terminal is compromise between brain activity signal removal and artifact
placed on the middle or proximal phalanx of the index finger. remaining.
In addition, the electrode belonging to the negative terminal is Figure 5 shows a diagram of the whole processing stage. After
placed on the middle or proximal phalanx of the middle finger this, the EEG signal is free of artifacts and it can be analyzed in
FIGURE 3 | (A) EEG, (B) GSR, and (C) RSP sensors and their respective locations (D–F).
the next stage using the feature extraction metrics presented in 2.3.2. Autonomic Signals
Section 2.4. To analyze the electrocardiogram signal, the QRS complex
In order to develop the proposed preprocessing algorithm, detection is required, so the preprocessing of the cardiac signal is
EEGLAB (Delorme and Makeig, 2004) and ADJUST (Mognon a very important step. First, the ECG signal is high-pass filtered in
et al., 2011) libraries were used. order to correct the baseline problems as baseline wander caused
FIGURE 4 | (A) The 30 IC’s with the artifact components marked in red to be rejected. (B) Spatial and temporal features and the frequency spectrum related to the
first component marked as artifact by ADJUST.
sP
by the effects of electrode impedance, the respiration or body Ne PNe
i=1 j = 1 (ui − u j )2
movements. A FIR filter (with cut off frequency of 0.5 Hz) is used GFP = (2)
Ne
for this purpose in order to avoid the phase distortion produced
by a IIR filter, which would modify the wave morphology. In
where ui is the potential at the electrode i (over time), uj is the
addition, the signal DC component is eliminated subtracting the
potential at the electrode j (over time) and Ne is the total number
mean. The next step is to apply a Notch filter in order to avoid the
of electrodes employed to compute the GFP.
power line interference (the interfering frequency is w0 = 50Hz).
Frontal areas are the cerebral locations mainly involved in
Muscle noise cause severe problems as low-amplitude waveform
the memorization and pleasantness phenomena (Vecchiato et al.,
being obstructed. To eliminate this noise a low-pass filtered (with
2010). Thus, the electrodes Fp1, Fpz, Fp2, F7, F3, Fz, F4, F8, Fc5,
a cut off frequency ranged from 60 to 70 Hz) is applied.
Fc1, Fc2, and Fc6 were taken into account in the calculation.
Regarding the GSR and RSP preprocessing, a morphological
A GFP signal was then calculated for each frequency band
filter is employed to remove the signal ripple in order to facilitate
considered in the experiment: δ (1–3 Hz), θ (4–7 Hz), α (8–12
the local maxima detection. This low-pass filter allows the
Hz), β (13–24 Hz), β extended (25–40 Hz), and γ (25–100 Hz).
elimination of the muscle noise (high frequencies) in order to
The blocks of neutral documentary (one before each ad block)
detect more accurately the sweating peaks (into the GSR signal)
are baseline periods taken as a reference. The purpose of these
and the inhalation/exhalation peaks (into the RSP signal).
blocks is to be able to register the basal cerebral activity to
remove phenomena as fatigue or lack of concentration. GFP
2.4. Feature Extraction normalization according to baseline periods provides the Zscore
2.4.1. EEG index computed as:
Global Field Power
The recorded signal obtained directly from the scalp shows GFPi − GFPB
Zscore = (3)
intra-cranial synchronous activation of many neurons. To σ (GFPB )
quantify the amount of cerebral activity, the Global Field Power
(GFP) (Lehmann and Skrandies, 1980) was employed using where GFPi is the Global Field Power during the ad under
Equation (2). analysis, GFPB is the Global Field Power during a period of 2-min
of the neutral documentary previous to the block of ads where is The input to the different classifiers is the time-average value
the ad under analysis located (Figure 1). of the remember and forget indexes for each stimulus.
For each stimulus the input to the different classifiers is the
time-average value of the GFP, Zscore and log(Zscore) in each Pleasantness Index (PI)
frequency band. The pleasantness index is a continuous metric along the time
that provides information about the moments of the audiovisual
Interest Index (II) content that are pleasing to the participants (Vecchiato et al.,
The interest index allows the commercial assessment in specific
2013). The cerebral activity registered by the left-frontal
time periods in Theta and Beta bands (Vecchiato et al., 2010).
electrodes is compared with the cerebral activity registered by the
For each ad and subject the most significant peaks for Zscore
right-frontal electrodes, so the Global Field Power in the Theta
variable were obtained, considering a peak all values that exceeds
and Alpha bands are computed employing asymmetric pairs of
the threshold of Zscore ≥ 3, associated with a p < 0.05 in the
electrodes obtaining GFPLeft and GFPRight for each participant
Gaussian curve fitted over Zscore distribution (averaged for all
and stimulus.
participants).
From the on-line survey explained in the previous section, it is
In this way, two parameters were calculated: the number
possible to know the participants pleasure about each audiovisual
of peaks during the total duration of a particular commercial
content under study. Using this information, the population is
(PNtotal ) and the number of peaks during the brand exposition
segmented in two groups: “Like” and “Dislike.” The GFPLeft and
periods of a particular commercial (PNbrand ). The interest index
GFPRight for each group is obtained by means of the Global Field
is computed for each ad and subject as:
Power (along the stimulus under analysis) average.
PNbrand It is possible to extract the pleasantness index for each
II = (4) group as:
PNtotal
The input to the different classifiers is the percentage of interest PI = GFPRight (L/D) − GFPLeft (L/D), (6)
for each commercial.
Finally, like and dislike pleasantness indexes are computed by
Memorization Index (MI)
means of a cubic smoothing from the PILike and the PIDislike in
This index allows to measure the capacity of each stimulus to
order to extract the signal envelope. In Figure 7 the PI in theta
be remembered (Vecchiato et al., 2011b). First, the GFP in theta
and alpha bands for the stimulus under study (“The Date”) can
and alpha bands (associated with human memorization process
be observed.
Vecchiato et al., 2011b) are normalized following:
The input to the different classifiers is the time-average value
GFPi of the like and dislike indexes for each stimulus.
MI = PM (5)
i = 1 GFPi
Power Spectral Density (PSD)
where GFPi is the Global Field Power along the duration of The amount of power in each frequency band and electrode was
the stimulus under analysis i and M is the number of temporal computed by means of the Welch periodogram (Welch, 1967). In
samples. particular, the average of PSD is computed as:
In order to extract the memorization index, an on-line survey
is carried out following the method used in Vecchiato et al. Nw −1
1 X
(2011c). Two hours after the experiment ends each participant PSDc,s = Pxi (7)
Nw
has to complete the on-line test designed by specialists in i=0
psychology. In this test different frames of each commercial are
presented and the subject must answer some questions about where Nw is the number of windows along the signal of the
these frames. By means of this test we can check the stimuli channel c in the stimulus s and Pxi is the periodogram for the
remembered and forgotten for each participant. ith window calculated as:
For each commercial, the population was segmented in two
groups. Subjects who remembered the ad were included in the 1
N−1
1 X 2
“remember group” and those who forgot it were included in the Pxi = |FFTN,xi |2 , xi (n)e−j2π nk/N (8)
N N
“forget group”. n=0
It is possible to compute the GFPRemember as the Global Field
Power average of participants that belong to “remember group” where N is the number of points to compute the FFT.
for each stimuli. In the same way, a GFPForget can be extracted The window size used to compute the Welch periodogram was
taking into account the “forget group.” Finally the remember and 128 samples corresponding to half second of the EEG signal and
forget indexes are computed by means of a cubic smoothing from the percentage of overlapping was 50%. The input to the different
the GFPRemember and the GFPForget in order to extract the signal classifiers are 6 (δ, θ , α, β, β extended and γ ) ×Npe features for
envelope. In Figure 6 the MI in theta and alpha bands for the each stimulus being Npe the number of asymmetric electrodes
stimulus under study (“The Date”) can be observed. pairs used in the experiment.
FIGURE 6 | Memorization index in (A) Theta and (B) Alpha bands for “The Date.”
FIGURE 7 | Pleasantness index in (A) Theta and (B) Alpha bands for “The Date.”
2.4.2. ECG the tachogram signal in four domains: time, frequency, time-
After ECG preprocessing, the QRS is detected, specifically the R frequency and non-linear analysis. All signals were reviewed
wave, by means of Pan-Tompkins’ algorithm (Pan and Tompkins, manually by an expert after the automatic R wave detection to
1985). The analysis of variations in the instantaneous heart avoid the existence of false positives or false negatives and with
rate time series using the beat-to-beat RR-intervals (the RR the aim of delete extremely noisy sections which could not be
tachogram) is known as Heart Rate Variability (HRV) analysis analyzed. In this manner, the non-existence of artifacts which
(American Heart Association, 1996). The balance between the could alter the signal is assured.
effects of the sympathetic and parasympathetic systems, the two Some parameters extracted in the time domain used in this
opposite acting branches of the autonomic nervous system, is study were: the maximum (maxRR) and minimum (minRR), the
referred to as the sympathovagal balance and is believed to be average (meanRR), the median (medianRR) and the standard
reflected in the beat-to-beat changes of the cardiac cycle (Kamath, deviation between RR intervals (SDRR), the standard deviation
1991). The HRV analysis is based on feature extraction from from the RR average interval in time-windows (SDARR), the
square root of the sum of the successive differences between between consecutive breaths, the deep breathing (RSPmax ) and
adjacent RR intervals (RMSSD), the number of successive RR the shallow breathing (RSPmin ).
pairs having a difference less than 50 ms (RR50 ) and the ratio Table 2 shows a summary of the parameters extracted from
between the RR50 and the total RRs (pRR50 ). each physiological signal. It is important to note that EEG
The Power Spectral Density (PSD) analysis provides parameters were calculated in each frequency band (excluding
information about the amount of power in several frequency the emotional indexes calculated as described above).
ranges of the tachogram signal. The analysis in the frequency
domain was carried out in four frequency bands: ULF(0–0.033
Hz), VLF(0.033–0.04 Hz), LF(0.04–0.15 Hz), and HF(0.15– 2.5. Classifiers
0.4 Hz) bands. For this work, the ULF and VLF bands are The different tested classifiers were Naive Bayes (John
ignored because these frequency bands are only important in and Langley, 1995), Logistic Regression (Cessie and van
24-h registers. The amount of power in each band is obtained Houwelingen, 1992), Multilayer Perceptron (Kohonen, 1988),
integrating the PSD signal between the bounds of the frequency Support Vector Machines (Chang and Lin, 2011), Linear Nearest
bands. The power metrics are presented in absolute values Neighbor search (Weber et al., 1998), Random Forest (Breiman,
(aLF, aHF), normalized to the total energy (nLF, nHF) or in 2001), AdaBoost (Freund and Schapire, 1996), Multiclass
a percentage value of the total energy (pLF, pHF). The power classifier (Bishop, 2006) and Bagging (Aslam et al., 2007). The
ratio between the LF and HF band provides information about used implementations of these classifiers are included in Weka
the sympathetic/parasympathetic balance. The power value of (Hall et al., 2009; Witten et al., 2011), a broadly used data mining
the peak on the fundamental frequency (peakLF, peakHF) is software and publicly available in Weka 3 (2009).
extracted too. In order to reduce dimensionallity, AttributeSelectedClassifier
Combining the analysis in the two domains discussed above, also available in Weka was used (Witten and Frank, 2005). It
the time-frequency analysis is performed. In this analysis the is a meta-classifier that takes a search algorithm and evaluator
same parameters as in frequency domain were computed in ECG similar to the base classifier. This makes the attribute selection
segments of a given time-length. process completely transparent and the base classifier receives
Regarding to the non-linear analysis, techniques such as: only the reduced dataset. It works by finding a subset of
Poincaré graphs and entropy-based measures were extracted features using the chosen feature selection method. It then uses
from HRV signal. Graphs of Poincaré are a type of graphics that this feature subset to train the specified classifier and output
try to represent the self-similarity of a signal. Graph plots the a classifier model. In addition, a wrapper (Kohavi and John,
current interval vs. the previous intervals (Fishman et al., 2012). 1987) can be used within the AttributeSelectedClassifier as the
Normally fits an ellipse positioned on the axis identity and with feature selection method. The feature selection method is used
center in the middle of RR intervals. The axes of the ellipse (SD1 to evaluate the accuracy of any feature subset. The wrapper
for the vertical axis and SD2 to the horizontal axis) represent can take any classifier and use it to perform feature selection.
the variability in the short term (SD1) and long-term (SD2) The advantage of using the wrapper is that the same machine
variability (Brennan et al., 2002). Sample Entropy (sampen) is learning algorithm can be used to evaluate the feature subset
a factor that attempts to quantify the complexity or degree of and also to train the final classifier, therefore expecting good
new information generated (Richman and Moorman, 2011). The results.
interpretation we can make of this parameter is basically that if Once the features are extracted, the data of the dataset
entropy worth 0, then consecutive sequences are identical and the must be preprocessed before the classification step. In the pre-
bigger its value most is the complexity of the analyzed signal. processing, two tasks are carried out: data normalization and data
For each stimulus and subject fifty-six parameters are resampling. This is necessary because the range of values of raw
computed by means of HRVAS tool (Ramshur, 2010; Guixeres data varies widely and the data set is clearly unbalanced and most
et al., 2014) (Figure 8). machine learning algorithms would not work properly on that
conditions. In this work, the method used for the normalization
2.4.3. GSR and RSP is to standardize all numeric attributes in the given dataset to
Ten features were extracted from the Galvanic Skin Response have zero mean and unit variance and, for the resampling, the
signal (Figure 9A). The average, the variance and the standard Synthetic Minority Oversampling TEchnique (SMOTE) (Chawla
deviation of the skin conductance along specific time periods et al., 2002) was applied.
under analysis (stimuli) was computed. In addition, the number Classifiers were tested by means of 10-fold stratified cross-
of local maxima and minima and the mean conductivity validation. In k-fold cross-validation, the original sample is
difference (GF − GB ) for each consecutive pair of local randomly partitioned into k equal sized subsamples. Of the k
minimum-maximum were calculated. For each stimulus, the subsamples, a single subsample is retained as the validation data
global maximum GSRmax and minimum GSRmin , the difference for testing the model, and the remaining k − 1 subsamples
of them (GSRmax − GSRmin ) and the ratio between the number are used as training data. The cross-validation process is then
of maxima and stimuli duration (peaks/time) were also extracted repeated k times (the folds), with each of the k subsamples used
from GSR signals. exactly once as the validation data. The k results from the folds are
Regarding to RSP signal (Figure 9B), six physiological then averaged to produce a single estimation. In stratified k-fold
parameters during each stimulus were extracted. The respiratory cross-validation, the folds are selected so that the mean response
rate, the average level of breathing, the longest and shortest time value is approximately equal in all the folds (Schneider, J., 1997).
FIGURE 9 | (A) GSR and (B) RSP physiological signals. The most representative parameters are highlighted.
3. RESULTS • EEG_PSD: dataset with the PSD metrics extracted from the
EEG signal (72 features).
In order to find which physiological signal was the best one, • EEG_IND: dataset with the Pleasantness, Memorization and
different datasets and combinations of them were used to Interest indexes’ metrics extracted from the EEG signal (8
perform the classification: features).
• EEG_GFP-ZSCORE: dataset with the GFP and Zcore metrics • EEG_ALL: dataset with all the before mentioned metrics
extracted from the EEG signal (18 features). extracted from the EEG signal (98 features).
• HRV: all the metrics extracted from the HRV signal (56 TABLE 2 | Summary table showing all the parameters extracted from each
features). biosignal used in this study.
• GSR: all the metrics extracted from the GSR signal (10 Category Parameters used
features).
• RSP: all the metrics extracted from the Respiration signal (6 Metrics based on GFP
features). Global Field Power Zscore
log(Zscore)
EEG features
Combination of signals:
Emotional indexes Interest Index (II)
• GSR + HRV: all the metrics extracted from GSR and HRV Memorization Index (MI)
datasets (66 features). Pleasantness Index (PI)
• GSR + HRV + EEG_IND: all the metrics extracted from Frequency domain Power Spectral Density (PSD)
GSR, HRV and EEG_IND (Pleasantness, Memorization and metrics Brainrate
Interest) datasets (74 features).
• GSR + HRV + EEG_ALL: all the metrics extracted from GSR, Time domain MaxRR, MinRR
metrics MeanRR, MedianRR
HRV and EEG (164 features). SDRR
SDARR
Combination of signals using only features selected by
RMSSD
AttributeSelectedClassifier: RR50
ECG features
pRR50
• GSR_SEL + HRV_SEL: only selected metrics chosen by the
best classifier with attribute selection from GSR and HRV Frequency domain aLF, aHF
datasets. metrics nLF, nHF
• GSR_SEL + HRV_SEL + EEG_IND_SEL: only selected pLF, pHF
peakLF, peakHF
metrics chosen by the best classifier with attribute selection
from GSR, HRV and EEG_IND datasets. Time-frequency The same parameters extracted
• GSR_SEL + HRV_SEL + EEG_ALL_SEL: only selected domain metrics in frequency domain
metrics chosen by the best classifier with attribute selection Non-linear analysis Poincaré Graphs (SD1, SD2)
from GSR, HRV and EEG_ALL datasets. metrics Entropy-based measures
A list of the features included in each dataset can be found in Time domain Average
Table 2. When combining signals, the instances (users) chosen metrics Variance
to conform the dataset were those corresponding to the dataset Standard deviation
Number of local minima
GSR features
Percentage of positive, neutral and negative ads correctly classified and average.
Bold values indicate the best performance obtained in each test and highlight the optimal combination of features for each dataset.
metrics (87.07%). The other two combinations obtained 80.00 with unlimited depth and unlimited number of features to be
and 81.90%, respectively. used in random selection.
Lastly, in the third round, the datasets tested were: GSR, GSR As a final test, the best dataset was classified with 11 more basic
+ HRV and GSR + HRV + EEG_IND. These datasets had and advanced classifiers, but none was able to beat the current
only the selected features by the atribute selection classifier. The accuracy of 89.76% (Table 5).
highest accuracy obtained was with Random Forest using the Regarding to “The Date”, the commercial under study,
selected attributes from the dataset combining GSR and HRV the model obtained training the best dataset with the best
signals (87.62%), and the GSR dataset alone obtained 85.00% of classifier (i.e., Number of maxima and Peaks/Time from GSR,
accuracy using Multi-Class, Bagging and Random Forest. SDANN from HRV trained with MultiClass, AdaBoostM1 and
The selected attributes for each dataset combination were: RandomForest) was able to always classify “The Date” as
positive.
• GSR: “Number of Peaks” and “Peaks/Time.”
• GSR + HRV: “Number of Peaks,” “Peaks/Time” and
“t_SDANN”. 4. DISCUSSION
• GSR + HRV + EEG_IND dataset: “Number of Peaks,”
“Peaks/Time,” “t_SDANN,” “mean_Theta,” “mean_Alfa,” In this work we intended to build an algorithm able to classify
“Index_Theta,” “Index_Beta,” “Peaks_Brand_Theta,” commercials automatically. To achieve it, we built a model using
“Peaks_Theta,” “Peaks_Brand_Beta_Ext” and “Peaks_ the best possible data available and off-the-shelf algorithms.
Beta_Ext.” Specifically, the main objective of this work was to find the
most discriminatory or representative features that allowed to
classify audiovisual content in 3 groups (positive, neutral and
3.2. Comparison of Classifiers negative) with the highest possible accuracy. To accomplish this,
In the light of the results presented previously, the decision we used EEG, GSR, HRV and respiration signals acquired from a
to test two more classifiers with the best 3 datasets was taken. group of 47 subjects while they were watching nine commercials.
The two new classifiers introduced were AdaBoost.M1 (AB) These commercials (excluding the ad under analysis) had been
with Random Forest and Multi-Class with AdaBoost.M1 classified and labeled previously according to their Ace Score
and Random Forest. Table 4 shows how these two new punctuation.
classifiers improved the accuracy by 2%, reaching the best Tests performed show that the best classification was
result with the dataset conformed by the selected attributes achieved using features extracted from GSR and HRV signals,
from the GSR and HRV signals, obtaining 89.76% of namely “ Number of maxima” and “ Peaks/Time” from GSR, and
accuracy. ‘‘SDANN ” from HRV, with an accuracy of 89.76%. On the other
The configuration of this combination of classifiers is as hand, the best accuracy with the EEG signal was 84.07%, attained
follows: the MultiClass metaclassifier was employed using a with the dataset formed by the interest and pleasantness indexes.
1-against-all strategy. The classifier used by MultiClass was However, datasets with metrics extracted from EEG signal were
AdaBoost. M1 with Random Forest as base classifier, 10 the best in classifying only negative instances.
iterations, 100 as weight pruning threshold and reweighting. The It is important to note that with just two features extracted
RandomForest classifier was configured to generate 100 trees from the GSR signal (“ Number of maxima” and “ Peaks/Time”)
Classifiers applied to the best datasets using only the features selected previously. Percentage of positive, neutral, negative and average results for the instances classified correctly.
Bold values indicate the best performance obtained in each test and highlight the optimal combination of features for each dataset.
and a combined classifier consisting of Multi-Class, Bagging and TABLE 5 | Results for different classifiers applied to the best dataset: GSR
Random Forest we were able to correctly classify 85% of the (Number of maxima, Peaks/Time) + HRV (t_SDANN).
instances. This means that it is possible to obtain an accuracy Algorithm Positive Neutral Negative Average
very similar to the highest one—only 4.76% below of the highest
accuracy obtained in this study and almost 1% above the best SVM 75.00 55.71 71.43 67.38
accuracy attained with EEG signals—with only two features Multilayer Perceptron 75.00 49.29 67.14 63.81
from a single signal, which makes it very simple, usable and Simple Logistic 75.00 36.43 85.71 65.71
portable. Naive Bayes 75.00 20.00 75.00 56.67
The implications of these results are that GSR and HRV Decision Table 75.00 22.86 83.57 60.48
signals provide more relevant information to classify an Zero Rule 100.00 0.00 0.00 33.33
ad. That could mean that the Autonomic Nervous System One Rule 57.86 56.43 60.00 58.10
is more useful for emotion classification than the Central Hoeffding Tree 74.29 52.14 47.86 58.10
Nervous System. This is supported by some other authors Linear NN search 95.00 88.57 85.00 89.52
who state that GSR and HRV signals are able to accurately AdaBoostM1, Linear NN 95.00 88.57 85.00 89.52
search
distinguish a user’s emotion (Yoo et al., 2005; Li and h. Chen,
MultiClass, AdaBoostM1, 95.00 88.57 85.00 89.52
2006). Linear NN search
Validation of the model was performed using cross-validation Random Forest 95.00 89.29 78.57 87.62
in a first step and the commercial under evaluation (“The AdaBoostM1, Random 95.00 91.43 80.71 89.05
Date”) which was not used before to train or perform the cross- Forest
validation of the model was used in the test stage. Our model was MultiClass, AdaBoostM1, 95.00 91.43 82.86 89.76
able to classify this ad as positive. Random Forest
The shortcomings of our method are mainly two. On the The best accuracy using the following classifiers was obtained with the default parameters
one hand, the attributes of the subjects could have been taken in the Weka software (Weka 3, 2009). Percentage of positive, neutral, negative and
into account in the classification stage in order to evaluate average results for the instances classified correctly.
Bold values indicate the best performance obtained in each test and which classifier
commercials according to a specific population. On the other
provides the most accurate classification.
hand, a more comprehensive and exhaustive validation with
more data could have been performed to get even more reliable
results. creation and evaluation of commercials focused in a particular
Regarding to the practical meaningfulness, these promising audience.
results could help to the development of an automatic system able In future works, voting majority could be used to improve the
to evaluate the quality of the commercials. This system could be accuracy of each class independently, which could lead to better
helpful for the companies reducing the cost of their advertising global results. In addition, other signals could be used as well to
design. Also, this kind of software would make possible the try to better discriminate among ads, such as Face Reader.
REFERENCES a time series from poincare plots. J. Appl. Physiol. 29, 1290–1297. doi:
10.1152/japplphysiol.01377.2010
American Heart Association (1996). Task force of the european society of Frantzidis, C., Bratsas, C., Klados, M., Konstantinidis, E., Lithari, C., Vivas, A.,
cardiology and the north american society of pacing and electrophysiology. Eur. et al. (2010a). On the classification of emotional biosignals evoked while
Heart J. 17, 354–381. viewing affective pictures: an integrated data-mining-based approach for
Appelhans, B., and Luecken, L. (2006). Heart rate variability as an index healthcare applications. Inf. Technol. Biomed. IEEE Trans. 14, 309–318. doi:
of regulated emotional responding. Rev. Gen. Psychol. 10, 229–240. doi: 10.1109/TITB.2009.2038481
10.1037/1089-2680.10.3.229 Frantzidis, C., Bratsas, C., Papadelis, C., Konstantinidis, E., Pappas, C., and
Aslam, J., Popa, R., and Rivest, R. (2007). “On estimating the size and confidence Bamidis, P. (2010b). Toward emotion aware computing: an integrated
of a statistical audit,” in Proceedings of the USENIX Workshop on Accurate approach using multichannel neurophysiological recordings and affective
Electronic Voting Technology (Boston, MA: EVT’07). visual stimuli. Inf. Technol. Biomed. IEEE Trans. 14, 589–597. doi:
Bishop, C. (2006). Pattern Recognition and Machine Learning (Information Science 10.1109/TITB.2010.2041553
and Statistics). Secaucus, NJ: Springer-Verlag New York, Inc. Freund, Y., and Schapire, R. (1996). “Experiments with a new boosting algorithm,”
Breiman, L. (2001). Random forests. Mach. Learn. 45, 5–32. doi: 10.1023/ in Machine Learning, Proceedings of the Thirteenth International Conference on
A:1010933404324 (ICML 1996), ed L. Saitta (Bari: Morgan Kaufmann), 148–156.
Brennan, M., Palaniswami, M., and Kamen, P. (2002). Poincar plot interpretation Guixeres, J., Redon, P., Saiz, J., Alvarez, J., Torr, M. I., Cantero, L., et al.
using a physiological model of HRV based on a network of oscillators. Am. (2014). Cardiovascular fitness in youth: association with obesity and metabolic
Physiol. Soc. 283, 1873–1886. doi: 10.1152/ajpheart.00405.2000 abnormalities. Nutr. Hospital. 29, 1290–1297. doi: 10.3305/nh.2014.29.
Cessie, L., and van Houwelingen, J. (1992). Ridge estimators in logistic regression. 6.7383
Appl. Stat. 41, 191–201. doi: 10.2307/2347628 Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I.
Chang, C., and Lin, C. (2011). LIBSVM: a library for support vector machines. (2009). The weka data mining software: an update. SIGKDD Explor. Newsl. 11,
Intel. Syst. Technol. ACM Trans. 2, 27:1–27:27. doi: 10.1145/1961189.1961199 10–18. doi: 10.1145/1656274.1656278
Chawla, N., Bowyer, K., Hall, L., and Kegelmeyer, W. (2002). Smote: Synthetic Hyvärinen, A., and Oja, E. (2000). Independent component analysis:
minority over-sampling technique. J. Artif. Intel. Res. 16, 321–357. algorithms and applications. Neural Netw. 13, 411–430. doi: 10.1016/S0893-
Christoforou, C., Christou-Champi, S., Constantinidou, F., and Theodorou, 6080(00)00026-5
M. (2015). From the eyes and the heart: a novel eye-gaze metric that Jasper, H. (1958). Report of the committee on methods of clinical examination
predicts video preferences of a large audience. Front. Psychol. 6:579. doi: in electroencephalography: 1957. Electroencephalogr. Clin. Neurophysiol. 10,
10.3389/fpsyg.2015.00579 370–375. doi: 10.1016/0013-4694(58)90053-1
Critchley, E. (2002). Electrodermal responses: what happens in the brain. John, G., and Langley, P. (1995). “Estimating continuous distributions in bayesian
Neuroscientist 8, 132–142. doi: 10.1177/107385840200800209 classifiers,” in Uncertainty in Artificial Intelligence, Proceedings of the Eleventh
Delorme, A., and Makeig, S. (2004). Eeglab: an open source toolbox for Conference on, UAI’95 (Pasadena, CA: Morgan Kaufmann Publishers Inc.),
analysis of single-trial eeg dynamics. J. Neurosci. Methods 134, 9–21. doi: 338–345.
10.1016/j.jneumeth.2003.10.009 Kamath, M. V. (1991). Effects of steady state exercise on the power spectrum of
Fernández-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. (2014). Do we heart rate variability. Med. Sci. Sports Exe. 23, 428–434. doi: 10.1249/00005768-
need hundreds of classifers to solve real world classifcation problems? J. Mach. 199104000-00007
Learn. Res. 15, 3133–3181. Kohavi, R., and John, G. (1987). An introduction to neural computing.
Fishman, M., Jacono, F. J., Park, S., Jamasebi, R., Thungtong, A., Loparo, K. A., Wrappers Feature Subset Select. Artif. Intel. 97, 273–324. doi: 10.1016/S0004-
et al. (2012). A method for analyzing temporal patterns of variability of 3702(97)00043-X
Kohonen, T. (1988). An introduction to neural computing. Neural Netw. 1, 3–16. Vecchiato, G., Cherubino, P., Trettel, A., and Babiloni, F. (2013). Neuroelectrical
doi: 10.1016/0893-6080(88)90020-2 Brain Imaging Tools for the Study of the Efficacy of TV Advertising Stimuli and
Lang, P., Greenwald, M., Bradley, M., and Hamm, A. (1993). Looking at pictures: Their Application to Neuromarketing, Volume 3 of Biosystems and Biorobotics.
affective, facial, visceral, and behavioral reactions. Psychophysiology 30, 261– Springer. doi: 10.1007/978-3-642-38064-8
273. doi: 10.1111/j.1469-8986.1993.tb03352.x Vecchiato, G., Flumeri, G., Maglione, A. G., Cherubino, P., Kong, W., Trettel,
Lehmann, D., and Skrandies, W. (1980). Reference-free identification of A., et al. (2014). An electroencephalographic peak density function to detect
components of checkerboard-evoked multichannel potential fields. memorization during the observation of tv commercials. Conf. Proc. IEEE Eng.
Electroencephalogr. Clin. Neurophysiol. 48, 609–621. doi: 10.1016/0013- Med. Biol. Soc. 969, 6969–6972. doi: 10.1109/embc.2014.6945231
4694(80)90419-8 Vecchiato, G., Toppi, J., Astolfi, L., Vico Fallani, F., Cincotti, F., Mattia, D., et
Li, L., and Chen, J-h. (2006). “Emotion recognition using physiological signals al. (2011c). Spectral eeg frontal asymmetries correlate with the experienced
from multiple subjects,” in 2006 International Conference on Intelligent pleasantness of tv commercial advertisements. Med. Biol. Eng. Comput. 49,
Information Hiding and Multimedia (San Francisco, CA), 355–358. doi: 579–583. doi: 10.1007/s11517-011-0747-x
10.1109/IIH-MSP.2006.265016 Wang, J., Pohlmeyer, E., Hanna, B., Jiang, Y.-G., Sajda, P., and Chang, S.-F.
Mognon, A., Jovicich, J., Bruzzone, L., and Buiatti, M. (2011). Adjust: an automatic (2009). “Brain state decoding for rapid image retrieval,” in Proceedings of ACM
eeg artifact detector based on the joint use of spatial and temporal features. International Conference on Multimedia (Beijing).
Psychophysiology 48, 229–240. doi: 10.1111/j.1469-8986.2010.01061.x Weber, R., Schek, H., and Blott, S. (1998). “A quantitative analysis and performance
Ohme, R., Matukin, M., and Pacula-lesniak, B. (2011). Biometric study for similarity-search methods in high-dimensional spaces,” in Proceedings
measures for interactive advertising. J. Interact. Adv. 11, 60–72. doi: of the 24rd International Conference on Very Large Data Bases, VLDB ’98, (San
10.1080/15252019.2011.10722185 Francisco, CA: Morgan Kaufmann Publishers Inc.), 194–205.
Pan, J., and Tompkins, W. J. (1985). A real-time qrs detection algorithm. IEEE Weka 3 (2009). Data Mining Software in Java. Available online at: http://www.cs.
Trans. Biomed. Eng. 32, 230–236. doi: 10.1109/TBME.1985.325532 waikato.ac.nz/ml/weka/
Ramshur, J. T. (2010). Design, Evaluation, and Application of Heart Rate Variability Welch, P. D. (1967). The use of fast Fourier transform for the estimation of power
Software (HRVAS). Master’s thesis, The University of Memphis. spectra: a method based on time averaging over short, modified periodograms.
Richman, J. S., and Moorman, J. R. (2011). Physiological time-series analysis using IEEE Trans. Audio Electroacoust. 15, 70–73. doi: 10.1109/TAU.1967.
approximate entropy and sample entropy. Cardiovasc. Res. 278, 2039–2049. 1161901
Schneider, J. (1997). Cross Validation. Available online at: http://www.cs. Witten, I., and Frank, E. (2005). Data Mining: Practical Machine Learning Tools
cmu.edu/~schneide/tut5/node42.html. Last accessed on 21st December and Techniques. 2nd Edn.). San Francisco, CA: Morgan Kaufmann.
2014. Witten, I., Frank, E., and Hall, M. (2011). Data Mining: Practical Machine Learning
Soleymani, M., Chanel, G., Kierkels, J., and Pun, T. (2008). “Affective ranking of Tools and Techniques. The Morgan Kaufmann Series in Data Management
movie scenes using physiological signals and content analysis.” in Proceedings Systems. Morgan Kaufmann, 3rd Edn.
of the 2nd ACM Workshop on Multimedia Semantics, Vol. 1 (New York, NY: Yoo, S. K., Lee, C. K., Park, Y. J., Kim, N. H., Lee, B. C., and Jeong, K. S. (2005).
ACM Press), 32–39. “Neural network based emotion estimation using heart rate variability and skin
Teixeira, R., Yamasaki, T., and Aizawa, K. (2012). Determination of emotional resistance,” in Proceedings of the First International Conference on Advances in
content of video clips by low-level audiovisual features. Multim. Tools Appl. Natural Computation - Volume Part I (Changsha: ICNC’05), 818–824.
61, 21–49. doi: 10.1007/s11042-010-0702-0
Vecchiato, G., Astolfi, L., De Vico Fallani, F., Cincotti, F., Mattia, D., Salinari, S., et Conflict of Interest Statement: The authors declare that the research was
al. (2010). Changes in brain activity during the observation of tv commercials conducted in the absence of any commercial or financial relationships that could
by using eeg, gsr and hr measurements. Brain Topogr. 23, 165–179. doi: be construed as a potential conflict of interest.
10.1007/s10548-009-0127-0
Vecchiato, G., Astolfi, L., De Vico Fallani, F., Toppi, J., Aloise, F., Bez, F., et al. Copyright © 2016 Colomer Granero, Fuentes-Hurtado, Naranjo Ornedo, Guixeres
(2011a). On the use of eeg or meg brain imaging tools in neuromarketing Provinciale, Ausín and Alcañiz Raya. This is an open-access article distributed
research. Intell. Neurosci. 2011:643489. doi: 10.1155/2011/643489 under the terms of the Creative Commons Attribution License (CC BY). The use,
Vecchiato, G., Babiloni, F., Astolfi, L., Toppi, J., Jounging, D., Wanzeng, K., et al. distribution or reproduction in other forums is permitted, provided the original
(2011b). Enhance of theta eeg spectral activity related to the memorization of author(s) or licensor are credited and that the original publication in this journal
commercial advertisings in chinese and italian subjects. Biomed. Eng. Inf. 11, is cited, in accordance with accepted academic practice. No use, distribution or
1491–1494. doi: 10.1109/bmei.2011.6098615 reproduction is permitted which does not comply with these terms.