0% found this document useful (0 votes)
67 views9 pages

Classifying Affective States Using Thermal Infrared Imaging of The Human Face

Classifying Affective States Using Thermal Infrared Imaging of the Human Face, artículo cientifico
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views9 pages

Classifying Affective States Using Thermal Infrared Imaging of The Human Face

Classifying Affective States Using Thermal Infrared Imaging of the Human Face, artículo cientifico
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO.

4, APRIL 2010 979

Classifying Affective States Using Thermal Infrared


Imaging of the Human Face
Brian R. Nhan∗ and Tom Chau, Senior Member, IEEE

Abstract—In this paper, time, frequency, and time–frequency target modalities (e.g., residual motions and verbalizations) are
features derived from thermal infrared data are used to discrim- often hindered by confounding factors leading to poor signal-to-
inate between self-reported affective states of an individual in re- noise ratios (SNR) [3], [4]. In addition to the challenges of ob-
sponse to visual stimuli drawn from the International Affective
Pictures System. A total of six binary classification tasks were taining usable signals for access technologies, researchers also
examined to distinguish baseline and affect states. Affect states have to grapple with the limitations of the synchronous commu-
were determined from subject-reported levels of arousal and va- nication paradigm, popularized by augmentative and alternative
lence. Mean adjusted accuracies of 70% to 80% were achieved for communication (AAC) systems. This paradigm relies exclu-
the baseline classifications tasks. Classification accuracies between sively on intentional and explicit user responses (e.g., switch
high and low ratings of arousal and valence were between 50%
and 60%, respectively. Our analysis showed that facial thermal activation) to periodically presented visual and/or auditory stim-
infrared imaging data of baseline and other affective states may uli, but overlooks potential instances of natural physiological
be separable. The results of this study suggest that classification of reactions, which may serve as passive communication cues.
facial thermal infrared imaging data coupled with affect models As a first step in exploring passive communication cues for ac-
can be used to provide information about an individual’s affective cess, we endeavor to classify natural changes in facial skin tem-
state for potential use as a passive communication pathway.
perature (FST), heart rate and respiration in response to visual
Index Terms—Access technology, emotion, genetic algorithm, stimuli of varying emotional content in the able-bodied popula-
infrared, pattern classification, thermography. tion. Recently, attempts have been made to correlate individual
emotions with changes in physiological signals (e.g., [5]). How-
I. INTRODUCTION
ever, such studies have generally involved many contact sensors
N MANY cases, for individuals with disability meaning-
I ful interactions with their environment becomes a nontriv-
ial task. The modalities most conducive to interactions, i.e.,
that are invasive in nature. For our intended population of in-
terest, namely individuals lacking an established access path-
way, it would be ideal to establish one, which is noninvasive to
verbal communication and body language or motion, may be minimize the risk of infection and discomfort to the individual
hindered or altered in such a way that communication, as we over long periods of usage. Furthermore, the sensors employed
know it cannot be achieved. In access research, we strive to should be portable to allow mobility with the individual while
discover ways to facilitate those interactions by developing de- not interfering with other monitoring, communication, or life
vices and strategies for individuals with profound disabilities. support systems. We feel that thermal infrared imaging systems
The communication paradigm in access research involves the in- satisfy the aforementioned requirements, and propose their use
dividual with disability, the access solution, and the individual’s in measuring FST as a passive access pathway.
environment [1]. Recent studies with thermal infrared imaging have shown
To ensure accurate and reliable deployment of the access solu- affective states like extreme stress [6], startle [7], fear [8],
tion, one must establish a robust link between the individual and arousal [9], and happiness [10] are related to certain proper-
the access technology, i.e., the access pathway. However, estab- ties of FST. However, to our knowledge this technology has yet
lishing this access pathway is often the most challenging aspect to be investigated for the purposes of access outside of motion
of access research [1]–[3]. This is largely because common detection applications (e.g., as in [11]). In addition, we believe
this to be one of the first concerted attempts at classifying gen-
Manuscript received July 4, 2009; revised September 17, 2009. First eral states of affect as described by affect dimensions (namely
published November 17, 2009; current version published March 24, 2010. This arousal and valence) using features derived from thermal in-
work was supported by the Natural Sciences and Engineering Research Council
of Canada, the Barbara and Frank Milligan Fellowship, the Hilda and William
frared imaging data.
Courtney Clayton Paediatric Research Fund, the Bloorview Children’s Hospital In our experiments, we use stimuli drawn from the Interna-
Foundation, and by the Canada Research Chairs Program. Asterisk indicates tional Affective Picture System (IAPS) [12] and attempt to clas-
corresponding author.
∗ B. R. Nhan is with the Bloorview Research Institute and the Institute of sify the natural responses in terms of subject-indicated levels of
Biomaterials and Biomedical Engineering, University of Toronto, Toronto, ON arousal and valence. We conducted our experiments with able-
M4G 1R8, Canada (e-mail: [email protected]). bodied individuals, so that arousal and valence ratings could be
T. Chau is with the Bloorview Kids Rehab, Toronto, ON M4G 1R8, Canada,
and also with the Bloorview Research Institute and the Institute of Biomaterials
obtained as indicators of affect. Arousal and valence dimensions
and Biomedical Engineering, University of Toronto, Toronto, ON M4G 1R8, have been used extensively in research to model the qualitative
Canada. variability of affect (e.g., [13]). This approach to classify emo-
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
tional responses to stimuli from the IAPS has been explored
Digital Object Identifier 10.1109/TBME.2009.2035926 in recent fMRI research [14]–[16]. Though these studies build

0018-9294/$26.00 © 2009 IEEE


980 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 4, APRIL 2010

Fig. 1. Typical experimental setup and equipment positions. (a) Respiratory


belt (b) Thermal infrared camera. (c) Photoelectic BVP sensor.

confidence that detecting natural responses at the brain activity Fig. 2. Regions of interest used to calculated various statistical time series:
left supraorbital (LFH), right supraorbital (RFH), left periorbital (LPO), right
level is possible, current fMRI technology is too expensive and periorbital (RPO), and nasal (NSP). Adapted with permission from [18].
cumbersome for continual usage as an access technology for
our target population. Hence, we look to the noninvasive and
portable technology of thermal infrared imaging to investigate camera operating between 7.5 and 13 µm (sensitivity <65 mK
whether or not these emotion-related activations in the brain at 30 ◦ C). The camera was situated at eye-level, approximately
also manifest themselves as measurable thermal changes on the 1 m from the participant. The camera captured the frontal plane
skin surface level. of the subject’s face using its automatic focus. Thermal infrared
The remainder of this paper outlines our approach to clas- imaging data were captured at 30 ft/s. Camera object parameters
sify facial infrared imaging data according to affect relevant (i.e., object emissivity, object distance, atmospheric tempera-
dimensions. Section II reviews our experimental setup and pro- ture, and RH) were set to reflect our experimental conditions.
tocol. Our classification strategy and approach is described in We set the object emissivity to the widely accepted value for skin
Section III and the results are presented in Section IV. The impli- as 0.98 [17]. Atmospheric temperature and RH were adjusted to
cations of this paper, as well as future steps to extend these exper- the stable values as measured by the HS-2000D sensor. For each
iments to individuals with disability are discussed in Section V. recording, various time series were calculated from the regions
of interest shown in Fig. 2. These regions were selected based on
II. EXPERIMENTS reports of salient affect-related information [6], [7], [9], [10].
Further explanation of the calculation of the aforementioned
A. Subjects time series are explained in Section III-A.
We recruited a convenience sample of 12 able-bodied asymp- 2) Other Signals: BVP was measured using a photoelectric
tomatic adults (nine females) of varying ethnicity (mean age pulse sensor (Model PPS, Grass Technologies) secured to the in-
24.0 ± 2.9 years) from the university and research laboratory dex finger of the subject’s left hand using a flexible Velcro strap.
community. Participants had no known cardiovascular or respi- We used a piezo crystal respiratory effort sensor belt Model
ratory conditions and were not taking medications at the time of 1370G by Grass Technologies. The respiratory effort sensor
the experiments. Each subject provided informed written con- was secured around the subjects’ thoracic abdomen using a fab-
sent prior to data acquisition. The experiment protocol was ap- ric Velcro belt. The pre-amplified BVP and respiratory effort
proved by the Research Ethics Boards of Bloorview Kids Rehab signals were acquired at a 60 Hz sampling rate. We extracted
and the University of Toronto. beat-to-beat heart rate, and respiration rate using the algorithm
outlined in [19].
B. Equipment
Facial thermal infrared data, blood volume pulse (BVP), and C. Experiment Protocol
respiratory effort were acquired in synchrony using a custom The experiment was conducted in a quiet, temperature and
LabVIEW virtual instrument. Thermal infrared imaging data humidity controlled room. Only the subject and the researcher
were streamed through a firewire connection to a data acquisi- were present in the room during data collection. The subjects
tion laptop. BVP and respiratory effort signals were acquired us- were seated comfortably in an armchair in front of a laptop
ing a National Instruments BNC-2110 shielded connector block screen as shown in Fig. 1. Visual stimuli were presented to the
and a National Instruments DAQCard-6036E 16-bit PCMCIA subjects on the laptop screen. After being seated the subjects
card. In addition to the aforementioned physiological signals, were asked to relax for a minimum of 20 min prior to com-
experiment room relative humidity (RH) and temperature were mencing data collection. This relaxation time was implemented
measured with a Precon HS-2000D sensor via RS-232 input. to allow the subjects’ cardiovascular and respiratory activity to
These measurements were used to ensure the experiment room return to equilibrium, as well as allow the subjects’ skin surface
was a thermal-neutral environment. The experimental setup and to reach thermal neutrality with the experiment room [20]. Dur-
typical positioning of the equipment are shown in Fig. 1. ing the experiments, the subjects were asked to maintain from
1) Thermal Infrared Imaging: We used an FLIR Systems excessive movement. Additionally, they were asked maintain
ThermaCAM (Model SC640) long wavelength infrared (LWIR) contact with a headrest situated behind their head to minimize
NHAN AND CHAU: CLASSIFYING AFFECTIVE STATES USING THERMAL INFRARED IMAGING OF HUMAN FACE 981

above-neck movement. This also acted to ensure a consistent skewness, kurtosis, and entropy) for each frame of a given trial.
distance between the camera and the user for proper focus of The mean temperature time series were further denoised us-
the thermal infrared camera. Prior to each experimental trial, the ing a wavelet-based approach; specifically a five-level discrete
thermal infrared camera was calibrated to an internal set point wavelet transform using the Daubecies wavelet (4 dB) with soft
to ensure comparable equipment performance throughout the thresholding.
experiment.
For each subject, the experiment consisted of 16 trials, each B. Classification Labels
90 s in length. In total 192 trials were collected; eight trials were Arousal and valence are two independent dimensions that
discarded because of excessive head turn and tilt. Each trial represent most of the qualitative variability of affect [13]. The
was split into three segments of equal length in time: prestimuli arousal dimension ranges from unexcited to excited and the va-
baseline (T1), stimuli presentation (T2), and poststimuli base- lence dimension ranges from unpleasant to pleasant. For each
line (T3) segments. In the T1 segment, the subject was shown T2 segment response, the associated subject-reported ratings
a black image with a centered white fixation dot and instructed of arousal and valence level were considered the ground truth
to relax. For the T2 segment, subjects were instructed to fo- measures of level of affect. The level of affect was labeled as
cus on the affect caused by the presented visual stimuli. Like either a high or low level by applying Otsu thresholding to the
the T1 segment, during the T3 segments subjects were again subject-reported arousal and valence ratings independently. We
shown the black image with the fixation dot and instructed to chose to use an Otsu threshold to separate the groups to allow
relax. Thermal infrared data, BVP, respiratory effort, and room for an objective means of finding the most natural partition-
RH and temperature were measured simultaneously and con- ing (as opposed to subjectively defining an arbitrary threshold
tinuously for all trials. In attempts to achieve a range of affect value) [22]. Since the arousal and valence dimensions are inde-
in subject responses, we chose to include visual stimuli with pendent [12], we applied the Otsu threshold independently for
varying arousal content (IAPS rating >6) and with varying va- each dimension; that is once for subject-reported arousal levels,
lence content ranging from low (IAPS rating <3) to high (IAPS and once for subject-reported valence levels. Subsequently, the
rating >6). Subjects were instructed to rate each visual stimuli partitioned group with the greater mean was labeled as high level
in terms of the arousal and valence dimensions on a nine-point of affect (i.e., HA or HV) and the lesser mean was labeled as low
scale as outlined in [12]. level of affect (i.e., LA or LV). Hence each T2 response segment
was labeled as either HA or LA in terms of arousal, and HV or
III. DATA ANALYSIS LV in terms of valence resulting in four possible situations for
We extracted various time, time–frequency, and frequency each T2 response: HA and HV; HA and LV; LA and HV; and
features from time series calculated from facial thermal infrared LA and LV. The 184 T2 segment responses were grouped into
imaging data. We identified six binary classification tasks: high 127 HA and 57 LA patterns, and again into 114 HV and 70 LV
arousal (HA) versus baseline (BASE), low arousal (LA) ver- patterns. The T1 segments were labeled as the corresponding
sus BASE, high valence (HV) versus BASE, low valence (LV) BASE patterns.
versus BASE, HA versus LA, and HV versus LV. The proce-
dure and rationale used to label the data, i.e., HA, LA, HV, LV, C. Feature Extraction
and BASE groups used in our supervised classification strategy A total of 402 features were extracted from each pattern. The
is described in Section III-B. Extracted features are elucidated feature pool consisted of: 78 features extracted from each ROI
in Section III-C followed by a description of our classification using the calculations detailed below (5 ROIs × 78 features =
strategy in Section III-D. 390 features); ten features involving pairs of ROIs (specifically
correlation coefficients); and two features derived from heart
A. Preprocessing rate and respiration rate signals.
The five regions of interests (ROIs), i.e., the periorbital, For each individual ROI, the following features were ex-
supraorbital and nasal regions, were specified relative to a track- tracted from the pre-processed mean temperature time series
ing point located on the top of the subjects’ head (see Fig. 2). The which we denote as {X = x1 , x2 , . . . , xN } for segments of N
tracking point was evaporatively cooled to approximately 10 ◦ C frames, where N = 900 for all T1 and T2 segments.
below skin temperature. The aforementioned point was auto- 1) Consider M ∈ [1, 2, 3, 6] nonoverlapping intervals {Ii ,
matically located in each frame by radiometric thresholding; i = 1, . . . , M }, where each interval contains exactly n,
the coolest pixel blob possessing the expected morphological i.e., |Ii | = n. The mean amplitude Ai for the ith interval
features was identified as the tracking point. Loss of tracking was computed as
was manually checked at every frame and corrected as necessary 1 
by the researcher. Ai = xk , i = 1, . . . , M (1)
n
k ∈I i
Pixels within the ROIs were filtered to retain the hottest 50%.
These warmer pixels have been shown to correspond to the un- where k specifies the frame index. This feature was also
derlying blood vessels mainly responsible for changes in skin extracted from skewness, variance, kurtosis, and entropy
temperature [21]. Time series for each threshold ROI were ob- time series for each ROI with M = 1 only. Changes in
tained by calculating summary statistics (i.e., mean, variance, amplitude in mean FST were shown to be correlated to
982 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 4, APRIL 2010

joy, arousal, and stress affect states in [7], [9], and [10], et al. suggests that in baseline the FST signals are highly
respectively. nonstationary [18]. This feature explores whether the dis-
2) Slope for the same nonoverlapping windows Ii was esti- tribution of the FST signals are different in baseline and
mated using linear least squares varying affective states.
   6) Features shown to be correlated with declines in arousal
tk xk − n tk xk
mi =   (2) level [9] were employed with varying interval sizes mea-
( tk )2 − n t2k
sured in number of frames ∆n ∈ [30, 60, 90, . . . , 450]
where the indices k, and variable n are same as in the mean
amplitude calculations. tk denotes the time corresponding ∆Tm ax = max (|yk |)Nk =1 (9)
to each observation xk . Since relevant literature suggests  N −∆ n
xk +∆ n − xk
mean amplitude change correlates with certain affective grad T1 = max (10)
∆n k =1
states, this feature was chosen to explore whether rate of
 N −∆ n
that change could be used to discriminate affect. xk +∆ n − xk
3) Zero-crossing count was used as a primitive estimate of grad T2 = min (11)
∆n k =1
signal frequency content. Consistent frequencies in FST 

N −∆ n
signals were shown to exist in the baseline state. This
xk +∆ n − xk

grad T3 = max



. (12)
feature provides a simple representation of frequency to ∆n

k =1
explore if deviations from the baseline frequencies de-
scribed in [18] occur when an individual experiences dif- More detailed descriptions of the features as applied to
ferent affect. The zero crossings were counted relative to thermal infrared imaging can be found in [9].
the mean of the given segment (i.e., yk = xk − mean[X]) 7) The stationarity statistic (zA ) of the reverse arrangements
as follows: test was employed as a feature. This stationarity test com-
pares the observed number of reverse arrangements with

N −1
that expected from the realization of a stationary random
ZC = HA (yk ) (3)
process (with an approximately normal distribution of the
k =1
number of reverse arrangements). Calculation details can
where HA is the following indicator function: be found in [24].

 1, if (yk )(yk +1 ) < 0 8) Signal energy was estimated using short time Fourier
HA (xk ) = k = 1, . . . , N − 1. transform (STFT) coefficients; specifically the 90%, 95%,
 and 99% signal energy cutoffs as described in [18], where
0, if (yk )(yk +1 ) > = 0
(4) we showed consistent ranges of signal energy for individ-
4) Dispersion ratio [23] is the ratio between mean absolute uals in a baseline state. This feature will explore whether
deviation and the interquartile range for the given seg- or not this frequency content varies with affective state.
ment. This feature was selected for reasons similar to 9) Pairwise Pearson’s product moment correlation coeffi-
those for the mean amplitude feature. Mean absolute de- cients (R) between mean temperature time series from
viation (MAD), interquartile range (Iqr ), and dispersion pairs of ROIs were computed for all possible pairings. Cor-
ratio (DR) are computed as follows: relations exists in FST signals of juxtaposed and contralat-
N  eral regions of healthy individuals at rest, e.g., [18], [25].
 Deviations from those correlations could indicate a change
MAD = mean |yk | (5)
k =1
from baseline.
Mean heart rate and respiration rate for a given segment were
Iqr = q0.75 − q0.25 (6) also included in our feature pool. Heart rate and respiration rate
where q0.75 and q0.25 are the 75% and 25% quartiles of has been shown to be correlated to changes in affect, e.g., [5].
the mean amplitude time series’ amplitude distribution,
respectively, D. Classification Procedure
MAD Six binary classification tasks were explored to provide infor-
DR = . (7)
Iqr mation about affect. These tasks were specified according to the
circumplex model of affect. The circumplex model allows an
5) The chi-squared goodness-of-fit statistic was used to esti-
individual to draw conclusions about the affect of an individual
mate adherence to a normal distribution. It was computed
based on the degree of arousal and valence. As opposed to us-
for a given segment as follows:
ing a continuum of arousal and valence levels initially proposed
10
(Oj − Ej )2 in [13], in our approach, we classify the responses as high or
χ2 = (8) low levels. With this simplification of the arousal and valence
j =1
Ej
dimensions, we reduce the circumplex model to four possible
where ten even sized bins were used, Oj is the observed affect quadrants (plus baseline) as shown in Fig. 3. This discrim-
number of counts in bin j, and Ej is the expected num- ination of the arousal-valence is similar in spirit to that reported
ber of counts in bin j given a normal distribution. Nhan in [26].
NHAN AND CHAU: CLASSIFYING AFFECTIVE STATES USING THERMAL INFRARED IMAGING OF HUMAN FACE 983

TABLE I
GA SETTINGS FOR CLASSIFICATION TASKS OUTLINED IN SECTION III-D

Fig. 3. Adaptation of the circumplex model of Russell [13] applied in this


paper.

choose to define a fitness function depending solely on the fit-


ness value while prescribing the feature subset dimensionalities
of interest. This was done for two reasons; first, we wanted
to limit the size of our feature subsets to reduce search time
(fewer possible subset combinations to search) since we per-
formed many repetitions of the GA and classification. Second,
we wanted to limit the maximum feature subset size explored
in order to maintain an appropriate training sample size (n)
to feature subset dimensionality (d) ratio (at least n/d > 5) to
avoid the “curse of dimensionality” [30]. We limited our search
to feature subsets sized 2–12; with this approach we repeat the
classification procedure outlined in Fig. 4 for populations ini-
tialized with a dimensionality of 2, then again with 3, and so on
to a maximum dimensionality of 12. The remaining parameters
used in our genetic algorithm are listed in Table I.
For each fold of the external cross validation, five indepen-
dent runs of the genetic algorithm were conducted to evaluate
whether or not the same partition/training data would result in
consistent feature subset selections. In the GA, we employed
the Fisher LDA as the fitness function where adjusted error rate,
as calculated in (13) and (14) [31], estimated with an internal
Fig. 4. Flow of data through our classification procedure. Left: Entire classi- tenfold cross validation was the fitness value. We choose to use
fication procedure. Right: Detailed flow through the genetic algorithm (shaded adjusted error rate as opposed to the standard error rate calcu-
box in the left panel). lation because of the imbalance in classification groups in our
data. This measure of error has been shown to help minimize
bias in cases of unbalanced data (e.g., [31]–[33]). For similar
The flow of our classification strategy is summarized in Fig. 4. reasons, we also used adjusted accuracy to estimate the per-
We used a tenfold cross validation (the external cross validation) formance of the linear classifier on the unseen test data in the
to estimate classification accuracy of a Fisher LDA classifier external cross validation. This measure is calculated as follows:
trained with feature subsets selected with a standard genetic
algorithm (GA). The GA was used for feature selection be- sensitivity + specificity
Adjusted accuracy = (13)
cause of its ability to identify near optimal feature subsets from 2
large feature pools [27]. GAs have been employed with success Adjusted error = 1 − (adjusted accuracy) (14)
in feature selection applications with large feature pools with-
out prior knowledge of individual feature discriminatory ability where sensitivity and specificity conform to their traditional
(e.g., [27]–[29]). In general, when using GAs in feature selection definitions.
the evolution of each generation is driven by fitness functions In each fold of the internal cross validation, the training data
dependent on both a fitness value (usually the classification er- were resampled to 1000 samples using a technique similar to
ror) and a cost function for the feature subset dimensionality. smooth bootstrapping (see [34]) assuming independence and
This approach allows the GA to select both the feature subset normality for each feature. We ignored the correlations among
dimensionality as well as the features themselves. However, we the features and did not attempt multivariate density estimations
984 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 4, APRIL 2010

TABLE II TABLE III


CLASSIFICATION ACCURACIES FOR AROUSAL CLASSIFICATION TASKS CLASSIFICATION ACCURACIES FOR VALENCE CLASSIFICATION TASKS

for resampling due to associated computational costs. This re-


sampling was applied to balance the training classes and en-
sure an adequate amount of training data. Similar bootstrap
based resampling strategies for estimating classification accura-
cies with small sample sizes have been shown to perform well
(e.g., [35]–[37]). The original training data of the features se-
lected by the genetic algorithm were also resampled to 1000
samples (with same assumptions of independence and normal-
ity of each individual feature) again to provide balanced training
data for classification of the unseen test data.
In order to test the validity of our classification accuracy es-
timates, we repeated the aforementioned procedure with ran-
domized classification labels. For each binary classification
task, we performed the classification procedure for three inde-
pendent randomizations of the associated classification labels.
The resulting adjusted classification accuracies were compared
against those obtained with correct labels. For given a given fea-
ture subset dimensionality, one-sided two sample t-tests (with
a p = 0.05) were used to test a null hypothesis of equal means
Fig. 5. Distribution of selected features across ROI. Data shown for each
against an alternate hypothesis that the mean adjusted classifi- classification task is for the highest mean adjusted accuracy.
cation accuracies obtained with the correct labels is greater than
that obtained with randomized labels.
were significantly greater than those obtained when the labels
IV. RESULTS were randomly assigned. For each of the classifications using
randomized classification labels, the effects of feature subset
The results for the classification tasks are shown in Tables II dimensionality and the randomization of labels were examined
and III. Overall, classification between an affective state and using two-way ANOVAs. The main effect of dimensionality was
baseline yielded higher mean adjusted accuracies than classifi- not significant.2 However, the main effect of the randomization
cation between high versus low arousal or valence. In general, was significant (which we would expect as the assignment of
feature subsets of higher dimensionality yielded better mean the classification labels is directly related to the classification it-
adjusted accuracies with the exception of the HV versus LV self). Interaction effects were only significant for the HA versus
classification. LA and LV versus BASE classification tasks.
For HA versus BASE, and HV versus BASE best-adjusted The features selected for classification were more
accuracies achieved were above 80%. LA versus BASE and LV highly concentrated within the periorbital (LPO, RPO) and
versus BASE classifications performed slightly worse with best- nasal/maxilliary (NSP) regions. As shown in Fig. 5, for the best
adjusted accuracies in the 70% range. HA versus LA and HV performing feature subset dimensionalities for each classifica-
versus LV classifications resulted in adjusted accuracies of 50– tion (as denoted in Tables II and III, respectively) nearly 70%
60%. Though these values were low, in almost all cases1 the ad- of the chosen features were associated with the periorbital and
justed accuracies obtained with the correct classification labels nasal regions. An exception to this result was observed with the
1 Exceptions: adjusted classification accuracies for randomized classification
labels in the HA versus LA task (dimensionalities of seven and ten) were not 2 For the two-way ANOVA, significance of main and interaction effects were
significantly different from that obtained with correct classification labels. tested with p = 0.05.
NHAN AND CHAU: CLASSIFYING AFFECTIVE STATES USING THERMAL INFRARED IMAGING OF HUMAN FACE 985

HV versus LV classification task where the supraorbital regions have been linearly separable given the features we choose to
(mainly the left) provided the most discriminatory features. The explore. To effectively explore other classifiers (e.g., k-NN [5],
distribution of features between ROIs for the randomizations of SVM [39] or neural networks [19], which have been used with
classification labels did not present a succinct pattern. some success in affect related classifications of physiological
signals) we would require more training data than that available
V. DISCUSSION in this paper.
The main finding in this paper is that facial thermal in-
frared imaging data can provide features to distinguish subject- B. Physiological Relevance
reported levels of dimensional affect from baseline. Using a stan-
Generally, the most discriminatory features were derived from
dard GA for feature selection and a linear classification scheme,
the periorbital and nasal areas. Recent reports of baseline char-
we were able to classify between subject-reported arousal, va-
acteristics showed that these regions were most information
lence, and baseline affective states with promising accuracies.
rich in the baseline state [18]. In studies attempting to correlate
Further, the accuracies were significantly higher than those ob-
FST changes with components of affect, periorbital areas where
tained by chance (i.e., with randomized labels).
shown to be indicative of arousal and stress [7], and the nasal
area of arousal and joy [9], [10]. We hypothesize that these ther-
A. Classification Performance mal changes manifesting on the skin surface are associated with
We observed greater separability between the subject- the cortical processing of visual stimuli used in our experiments.
reported levels of dimensional affect and baseline, as compared fMRI studies utilizing IAPS stimuli to localize emotion induced
to subject-reported high and low levels of dimensional affect. activation in the brain have indicated correlating responses in
Subject reported affective states and the baseline state appear the limbic lobe (namely amygdala, thalamus) and hypothala-
to be linearly separable with strategically chosen features. Lev- mus, among other regions [14]–[16], to subjective reports of
els of affect seem to be less separable linearly, although the arousal and valence. We postulate that these activations in the
achieved accuracies were significantly higher that chance. We brain, specifically the hypothalamus, result in changes in the
attribute this poor performance to several factors relating to both skin surface temperature because of the hypothalamus’ role in
the underlying physiology and the experimental protocol. homeostasis, thermal and emotional sweating [38], and fight or
It has been shown that areas of activation in the brain are very flight responses [40].
different during times of rest (i.e., baseline) and during times of
heightened emotions (e.g., [14]–[16]). Furthermore, instances
of mental activation due to emotional stimulation have been C. Limitations
shown to manifest at the skin surface (e.g., as emotional sweat- The processes of the body, including control of skin temper-
ing [38] and changes in skin temperature [7]). Hence, one would ature, vary throughout the day with circadian rhythms [41].
expect that the skin surface as measured with thermal infrared Hence, one would expect a fluctuating baseline as the day
imaging would yield very different temperature profiles when progresses. The effect of these fluctuations on our classifica-
an individual is at a baseline state as compared to an emotion- tion tasks would have to be explored by conducting recordings
ally stimulated state. Specifically we would expect responses to throughout the course of the day. This would yield a more rep-
be homeostatic in nature during baseline, while responses to be resentative baseline description for classification (i.e., multiple,
sympathetic in nature during increased affect—thus, lending to time-dependant descriptions) to better compare against affective
more separable data. states.
In contrast, areas of activation in the brain are largely the Though thermal infrared imaging can provide very accurate
same regardless of high or low levels of dimensional affect; and repeatable measures of skin temperature distribution [42],
the difference being only in the intensity of the activation in [43], the level of accuracy is dependent on properly acquired
the given areas [14]. If we assume that the changes in intensity and focused recordings. Our experiments consisted of highly
of brain activation also modulates associated sympathetic skin controlled conditions in which the subjects were restricted in
responses (e.g., the emotional sweating and skin temperature movement to limit motion artifacts and loss of focus. However,
responses mentioned earlier), we would expect affect level to for individuals in the target population, such restrictions may be
change the intensity of the skin surface manifestations. Given undesirable or impossible. Algorithms developed specifically
such hypotheses, the manifestations due to high and low arousal to track thermal infrared regions of interest could be employed
and valence levels may not have been different enough to be de- to limit the affect of motion (e.g., [44]). However, even these
tectable with thermal infrared imaging. Additionally, the stimuli specialized algorithms encounter problems when dealing with
we presented may not have been sufficiently distinct to facilitate loss of focus. To rectify this, one would have to find an appro-
classification between high and low levels despite the natural priate method of limiting movements, or employ some means
high and low separation in the subject ratings. In accordance of real-time automatic focusing. One should note a subset of the
with this point, postexperiment discussions revealed that many target population, namely those individuals diagnosed with LIS,
subjects felt little difference between the intended high and low challenges of motion artifacts and loss of focus are not detri-
arousal and valence images contrary to the different IAPS rat- mental since these individuals rarely exhibit neither voluntary
ings they had reported. Last, the high and low classes may not nor involuntary movements.
986 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 57, NO. 4, APRIL 2010

Since the experiments were conducted in thermal neutral con- applications in access such as electrodermal activity [2] and
ditions, additional experiments in different environments need cortical hemodynamics [46]. However, to further verify the re-
be conducted to explore the effect of environmental fluctuations. sults of these classification experiments and the aforementioned
Changes in temperature, humidity, and ambient lighting can claims, it would be necessary to repeat the experiment with a
potentially cause thermoregulatory responses as well as affect larger sample size.
the accuracy of the thermal infrared imaging system [45]. For Thermal infrared imaging of FST can be applicable as a sec-
experiments in uncontrolled thermal conditions, data from am- ondary access pathway that provides constant information re-
bient temperature and humidity sensors would aid in detecting garding affect. Such an application would compliment outputs
instances where environmental changes may trigger thermoreg- from a primary access pathway and would not necessitate up-
ulatory responses in the facial skin surface. Additionally, the dates much faster than our observed timescale (i.e., 30 s) as we
advent of intense ambient lighting, e.g., sunlight from windows, would not expect mood to change drastically over short periods
bright fluorescent lights, may necessitate the use of light sensors of time. The additional information of affect would also pro-
to correct for potential effects on measurement accuracy due to vide a means of detecting malfunctions of the primary access
reflected radiation. pathway; the individual may react negatively to the incorrect op-
In this paper, we simplified the circumplex model of affect to eration of the primary pathway, and this reaction may be measur-
four quadrants thereby limiting the resolution levels of arousal able with via thermal imaging. Lastly, because this technology
and valence. However, even within a quadrant the model indi- is both portable, noninvasive, and robust to environmental con-
cates the possibility of emotions with very different implica- ditions, it lends itself to use by individuals who are mobile be-
tions (e.g., the quadrant of HA and HV encompasses both being tween environments without encumbering their existing lifestyle
peppy, and being aroused [13]). To increase the resolution of af- circumstances.
fect dimensions, one would require additional trials with a wider
range of arousal and valence stimuli to provide enough training VI. CONCLUSION
data to more finely partition the arousal and valence dimensions. We classified different affective states (as represented by dif-
In this paper, we have only studied physiological responses fering degrees of subject reported arousal and valence) and base-
to visual stimuli. Future work will explore the possibility of line state. A genetic algorithm was used to search for the best
classifying affect with different modalities of stimuli presenta- performing combinations of various time, and time–frequency
tion. Generalizability across modalities is necessary; in practical features extracted from facial thermal infrared imaging data,
situations an individual’s emotions can be affected by many dif- blood volume pulse data, and respiration data. In classifying be-
ferent combinations of stimuli modalities. In addition, some tween baseline and high arousal and valence levels, we achieved
individuals in the target population can only perceive stimuli adjusted classification accuracies around 80%. With baseline
from certain modalities. versus low arousal and valence levels, we achieved adjusted
Last, we have only examined able-bodied subjects in this ex- classification accuracies of approximately 75%. The results we
periment. Future studies must examine the discriminatory ability present provide evidence that thermal infrared imaging derived
of infrared thermal imaging data in individuals with disability. FST features for classification of affect is plausible. In addition,
The diagnostic heterogeneity of individuals in the target popu- we provide a conceptual foundation on which thermal infrared
lation may introduce differences in manifestations of emotion imaging data can be developed for use as a primary or sec-
at the skin surface. Further, many medications common to the ondary access pathway. Additional work to improve accuracy
target population (e.g., analgesics, antidepressants, antihyper- and resolution of affect dimensions, as well as experiments with
tensives, antispasmatics, melotonin, and niasin) affect the phys- individuals with disability are necessary before practical appli-
iological signals measured in this paper. We would therefore re- cations can be realized.
quire individual-specific classifiers to ensure good performance,
increasing the data acquisition requirements. In addition, with- ACKNOWLEDGMENT
out an established communication channel to indicate the level
The authors would like to thank Dr. B. Kavanagh and Dr. A.
of the affect dimensions, it would be difficult to establish ground
Martel for their comments and suggestions. The authors would
truth measures with which to evaluate classifier accuracy.
also like to thank their colleagues in the PRISM Laboratory for
the invaluable discussions and assistance provided throughout
D. Implications for Development as Access Pathway the preparation of this manuscript. The authors would like to give
Several key findings provide support for the use of thermal special thanks to N. Alves-Kotzev, J. Lee, and Dr. E. Sejdić.
infrared imaging in practical applications of access. First, the
REFERENCES
resulting adjusted classification accuracies for certain affective
states against baseline were above 80%. In addition, these accu- [1] K. Tai, S. Blain, and T. Chau, “A review of emerging access technologies
for individuals with severe motor impairments,” Assist. Technol., vol. 20,
racies were obtained across multiple subjects, where no specific pp. 204–219, 2008.
instructions were given for interpreting the stimuli. This sug- [2] S. Blain, A. Mihailidis, and T. Chau, “Assessing the potential of electro-
gests the selected features may be robust across individuals and dermal activity as an alternate access pathway,” Med. Eng. Phys., vol. 30,
pp. 498–505, 2008.
representative of natural responses to stimuli. For classification, [3] N. Memarian, A. Venetsanopoulos, and T. Chau, “Mutual information as
features using up to 30 s of data were required. This time frame a measure of contextual effects on single switch use,” Open Rehabil. J.,
is comparable to other physiological pathways investigated for vol. 2, pp. 1–10, 2009.
NHAN AND CHAU: CLASSIFYING AFFECTIVE STATES USING THERMAL INFRARED IMAGING OF HUMAN FACE 987

[4] N. Birbaumer and L. G. Cohen, “Brain-computer interfaces: Communi- [31] G. Tzanis, C. Berberidis, A. Alexandridou, and I. Vlahavas, “Improving
cation and restoration of movement in paralysis,” J. Physiol., vol. 579, the accuracy of classifiers for the prediction of translation initiaion sites in
no. 3, pp. 621–636, 2007. genomic sequences,” Lecture Notes Comput. Sci., vol. 3746, pp. 426–436,
[5] R. W. Picard, E. Vyzas, and J. Healey, “Toward machine emotional in- 2005.
telligence: analysis of affective physiological state,” IEEE Trans. Pattern [32] S. C. Manoharan, M. Veezhinathan, and S. Ramakrishman, “Comparison
Anal. Mach. Intell., vol. 23, no. 10, pp. 1175–1191, Oct. 2001. of two ANN methods for classification of spirometer data,” Meas. Sci.
[6] I. Pavlidis, J. Dowdall, N. Sun, C. Puri, J. Fei, and M. Garbey, “Interacting Rev., vol. 8, no. 3, pp. 53–57, 2008.
with human physiology,” Comput. Vis. Image Understanding, vol. 108, [33] F. Zeng, R. Yap, and L. Wong, “Using feature generation and feature
no. 1–2, pp. 150–170, 2007. selection for accurate prediction of translation initiation sites,” Genome
[7] D. Shastri, A. Merla, P. Tsiamyrtzis, and I. Pavlidis, “Imaging facial signs Inf., vol. 13, pp. 192–200, 2002.
of neurophysiological responses,” IEEE Trans. Biomed. Eng., vol. 56, [34] B. Efron, “Bootstrap methods: Another look at the jackknife,” Ann. Stat.,
no. 2, pp. 477–484, Feb. 2009. vol. 7, no. 1, pp. 1–26, 1979.
[8] J. A. Levine, I. Pavlidis, and M. Cooper, “The face of fear,” The Lancet, [35] H. Xiong, Y. Zhang, and X. Chen, “Data-dependent kernel machines for
vol. 357, no. 9270, pp. 1757–1757, 2001. microarray data classification,” IEEE/ACM Trans. Comput. Biol. Bioin-
[9] A. Nozawa and M. Tacano, “Correlation analysis on alpha attenuation formatics, vol. 4, no. 4, pp. 583–595, Oct.–Dec. 2007.
and nasal skin temperature,” J. Stat. Mech.: Theory Exp., vol. P01007, [36] W. Fu, R. Carroll, and S. Wang, “Estimating misclassification error with
pp. 1–10, 2009. small samples via bootstrap cross-validation,” Bioinformatics, vol. 21,
[10] R. Nakanishi and K. Imai-Matsumura, “Facial skin temperature decreases no. 9, pp. 1979–1986, 2005.
in infants with joyful expression,” Infant Behav. Dev., vol. 31, pp. 137– [37] A. K. Jain, R. Dubes, and C. Chen, “Bootstrap techniques for error esti-
144, 2008. mation,” IEEE Trans. Pattern Recog. Mach. Intell., vol. PAMI-9, no. 5,
[11] N. Memarian, A. Venetsanopoulos, and T. Chau, “Infrared thermography pp. 628–633, Sep. 1987.
as an access pathway for individuals with severe motor impairments,” J. [38] R. Vetrugno, R. Liguori, P. Cortelli, and P. Montagna, “Sympathetis skin
Neuroeng. Rehabil., vol. 6, no. 11, pp. 1–8, 2009. response: Basic mechanisms and clinical applications,” Clin. Auton. Res.,
[12] P. J. Lang, M. M. Bradley, and B. N. Cuthbert, “International affective vol. 13, no. 4, pp. 256–270, 2003.
picture system (IAPS): Affective ratings of pictures and instruction man- [39] L. Li and J. Chen, “Emotion recognition using physiological signals,”
ual. Technical report A-6,” Univ. Florida, Gainesville, FL, Tech. Rep. 6, Lecture Notes Comput. Sci., vol. 4282, pp. 437–446, 2006.
2005. [40] C. L. Stanfield and W. J. Germann, Principles of Human Physiology, 3rd
[13] J. A. Russell, “A circumplex model of affect,” J. Pers. Soc. Psychol., ed. San Francisco, CA: Pearson Benjamin Cummings, 2007.
vol. 39, pp. 1161–1178, 1980. [41] K. Krauchi and A. Wirz-Justice, “Circadian rhythm of heat production,
[14] J. Britton, K. Phan, S. Taylor, R. Welsh, K. Berridge, and I. Liberzon, heart rate, and skin and core temperature under unmasking conditions
“Neural correlates of social and nonsocial emotions: An fMRI study,” in men,” Amer. J. Appl. Physiol.: Regul., Integr. Comparitive Physiol.,
Neuroimage, vol. 31, no. 1, pp. 397–409, 2006. vol. 267, no. 3, pp. 819–829, 1994.
[15] S. Anders, M. Lotze, M. Erb, W. Grodd, and N. Birbaumer, “Brain ac- [42] N. Zaproudina, V. Varmavuo, O. Airaksinen, and M. Närhi, “Reproducibil-
tivity underlying emotional valence and arousal: a response-related fMRI ity of infrared thermography measurements in healthy individuals,” Phys-
study,” Human Brain Mapp., vol. 23, no. 4, pp. 200–209, 2004. iol. Meas., vol. 29, pp. 515–524, 2008.
[16] F. Dolcos, K. S. LaBar, and R. Cabeza, “Dissociable effects of arousal [43] E. F. J. Ring, “The historical development of thermal imaging in
and valence on prefrontal activity indexing emotional evaluation and sub- medicine,” Rheumatology, vol. 43, no. 6, pp. 800–802, 2004.
sequent memory: An event-related fMRI study,” Neuroimage, vol. 23, [44] I. T. Pavlidis, “Coalitional tracking,” Comput. Vis. Image Understanding,
no. 1, pp. 64–74, 2004. vol. 106, no. 2–3, pp. 205–219, 2007.
[17] J. Steketee, “Spectral emissivity of the skin and pericardium,” Phys. Med. [45] B. F. Jones, “A reappraisal of the use of infrared thermal image analysis in
Biol., vol. 18, no. 5, pp. 686–694, 1973. medicine,” IEEE Trans. Med. Imag., vol. 17, no. 6, pp. 1019–1027, Dec.
[18] B. R. Nhan and T. Chau, “Infrared thermal imaging as a physiological 1998.
access pathway: A study of the baseline characteristics of facial skin [46] M. Naito, Y. Michioka, K. Ozawa, Y. Ito, M. Kiguchi, and T. Kanazawa, “A
temperatures,” Physiol. Meas., vol. 30, pp. N23–N35, 2009. communication means for totally locked-in als patients based on changes
[19] A. Haag, S. Gronzy, P. Schaich, and J. Williams, “Emotion recognition in cerebral blood volume measured with near-infreared light,” IEICE
using bio-sensors: First steps towards an automatic system,” Lecture Trans. Inf. Syst., vol. 90, pp. 1028–1037, 2007.
Notes Comput. Sci., vol. 3068, pp. 36–48, 2004.
[20] E. F. Ring and K. Ammer, The Biomedical Engineering Handbook, Brian R. Nhan received the B.A.Sc. degree in me-
3rd ed. Boca Raton, FL: CRC Press, 2006, vol. 2, pp. 1–9, ch. 36. chanical engineering from the University of Windsor,
[21] N. Charkoudian, “Skin blood flow in adult human thermoregulation: how Windsor, ON, Canada, in 2007. He is currently work-
it works, when it does not, and why,” Mayo Clinic Proc., vol. 78, pp. 603– ing toward the M.A.Sc. degree in biomedical engi-
612, 2003. neering from the Bloorview Research Institute and
[22] N. Otsu, “A threshold selection method for gray-level histogram,” IEEE Institute of Biomaterials and Biomedical Engineer-
Trans. Syst., Man, Cybern., vol. SMC-9, no. 1, pp. 62–66, Jan. 1979. ing, University of Toronto, Toronto, ON.
[23] J. Lee, S. Blain, and T. Chau, “A radial basis classifier for automatic His current research interest includes infrared ther-
detection of aspiration in children,” J. Neural Eng. Rehabil., vol. 3, pp. 1– mography and its application in developing alterna-
17, 2006. tive communication technologies for individuals with
[24] T. Chau, D. Chau, M. Casas, G. Berall, and D. J. Kenny, “Investigating disabilities.
the stationarity of paediatric aspiration signals,” IEEE Trans. Neural Syst.
Rehabil. Eng., vol. 13, no. 1, pp. 99–105, Mar. 2005.
[25] S. Uematsu, “Symmetry of skin temperature comparing one side of the Tom Chau (S’92–M’97–SM’03) received the
body to the other,” Thermology, vol. 1, pp. 4–7, 1986. B.A.Sc. degree in engineering science and the
[26] J. Kim and E. André, “Emotion recognition based on physiological M.A.Sc. degree in electrical and computer engineer-
changes in music listening,” IEEE Trans. Pattern Anal. Mach. Intell., ing from the University of Toronto, Toronto, ON,
vol. 30, no. 12, pp. 2067–2083, Dec. 2008. Canada, in 1992 and 1994, respectively, and the Ph.D.
[27] J. Yang and V. Honavar, “Feature subset selection using a genetic algo- degree in systems design engineering from the Uni-
rithm,” IEEE Intell. Syst. Appl., vol. 13, no. 2, pp. 44–49, Mar./Apr. versity of Waterloo, Waterloo, ON, in 1997.
1998. He was with IBM Markham, ON. Currently, he
[28] J. Hong and S. Cho, “Efficient huge-scale feature selection with speciated is a Scientist at Bloorview Kids Rehab, Toronto, and
genetic algorithm,” Pattern Recog. Lett., vol. 27, pp. 143–150, 2006. an Assistant Professor and Clinical Engineering Pro-
[29] I. Oh, J. Lee, and B. Moon, “Hybrid genetic algorithms for feature selec- gram Director at the Bloorview Research Institute and
tion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 26, no. 11, pp. 1424– Institute of Biomaterials and Biomedical Engineering, University of Toronto.
1437, Nov. 2004. He holds a Canada Research Chair in Pediatric Rehabilitation Engineering. His
[30] S. J. Raudys and A. K. Jain, “Small sample size effects in statistical research interests include the development and investigation of intelligent tech-
pattern recognition: Recommendations for practitioners,” IEEE Trans. nologies and analytical techniques to decode functional intention of individuals
Pattern Anal. Mach. Intell., vol. 13, no. 3, pp. 252–264, Mar. 1991. with disabilities.

You might also like