0% found this document useful (0 votes)

19 views9 pages

Decision Tree Based Depression Classicationfrom Audio Videoand Language Information

Uploaded by

alvee rohan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views9 pages

Decision Tree Based Depression Classicationfrom Audio Videoand Language Information

Uploaded by

alvee rohan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/309380604

Decision Tree Based Depression Classiﬁcation from Audio Video and Language
Information

Article · November 2016

DOI: 10.1145/2988257.2988269

CITATIONS READS
90 2,356

6 authors, including:

Le Yang Lang he
Northwestern Polytechnical University Xi'an University of Posts and Telecommunications
18 PUBLICATIONS 701 CITATIONS 20 PUBLICATIONS 673 CITATIONS

SEE PROFILE SEE PROFILE

Ercheng Pei Meshia Cédric Oveneke

Northwestern Polytechnical University Vrije Universiteit Brussel
20 PUBLICATIONS 550 CITATIONS 23 PUBLICATIONS 454 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Le Yang on 27 June 2018.

The user has requested enhancement of the downloaded file.

Decision Tree Based Depression Classification from Audio
Video and Language Information

Le Yang Dongmei Jiang Lang He

NPU-VUB Joint AVSP Lab NPU-VUB Joint AVSP Lab NPU-VUB Joint AVSP Lab
School of Computer Science School of Computer Science School of Computer Science
Northwestern Polytechnical Northwestern Polytechnical Northwestern Polytechnical
University(NPU) University(NPU) University (NPU)
127 Youyi Xilu, Xi’an 710072, 127 Youyi Xilu, Xi’an 710072, 127 Youyi Xilu, Xi’an 710072,
China China China
[email protected] [email protected] [email protected]
Ercheng Pei Meshia Cédric Oveneke Hichem Sahli
NPU-VUB Joint AVSP Lab NPU-VUB Joint AVSP Lab
NPU-VUB Joint AVSP Lab
School of Computer Science Dept. Electronics &
Dept. ETRO, VUB
Northwestern Polytechnical Informatics (ETRO)
Pleinlaan 2, 1050 Brussels
University (NPU) Vrije Universiteit Brussel(VUB)
Interuniversity
127 Youyi Xilu, Xi’an 710072, Pleinlaan 2, 1050 Brussels,
Microelectronics Centre
China Belgium
Kepeldreef 75, 3001 Heverlee,
[email protected] [email protected] Belgium
[email protected]

ABSTRACT Keywords
In order to improve the recognition accuracy of the Depres- Depression classification, decision tree, multi-modal
sion Classification Sub-Challenge (DCC) of the AVEC 2016,
in this paper we propose a decision tree for depression clas-
sification. The decision tree is constructed according to the
1. INTRODUCTION
distribution of the multimodal prediction of PHQ-8 scores Nowadays, depression and anxiety disorders are highly
and participants’ characteristics (PTSD/Depression Diag- prevalent worldwide causing burden and disability for indi-
nostic, sleep-status, feeling and personality) obtained via viduals, families and society. According to the World Health
the analysis of the transcript files of the participants. The Organization (WHO), depression will be the fourth mental
proposed gender specific decision tree provides a way of fus- disorder in 2020. The affective computing community has
ing the upper level language information with the results shown a growing interest in developing various systems, us-
obtained using low level audio and visual features. Exper- ing audio and video features, to assist psychologists in the
iments are carried out on the Distress Analysis Interview prevention and treatment of clinical depression.
Corpus - Wizard of Oz (DAIC-WOZ) database, results show Considering the audio features, researchers have found
that the proposed depression classification schemes obtain that depressed subjects are prone to possess a low dynamic
very promising results on the development set, with F1 s- range of the fundamental frequency, a slow speaking rate,
core reaching 0.857 for class depressed and 0.964 for class a slightly shorter speaking duration, and a relatively mono-
not depressed. Despite of the over-fitting problem in train- tone delivery [13], [5], [23], [16], [1]. Moreover, com-
ing the models of predicting the PHQ-8 scores, the classi- pared with health controls, the Harmonic-to-Noise Ratio
fication schemes still obtain satisfying performance on the (HNR) values of depressed subjects are higher [14]. Con-
test set. The F1 score reaches 0.571 for class depressed and sequently, researchers formulated subtle changes in speech
0.877 for class not depressed, with the average 0.724 which characteristics (e.g., differences in pitch, loudness, speaking
is higher than the baseline result 0.700. rate, articulation, etc.) as indicators of depression. Low-
level descriptors (LLDs) (such as energy, spectrum, and Mel
frequency cepstrum coefficients-MFCC) based features were
used as baseline audio features in the Audio Visual Emo-
tion Challenge and Workshops (AVEC2013 and AVEC2014)
[22], [21]. In [18] and [15], an i-vector based representation
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed was computed to convert the frame-level features to a global
for profit or commercial advantage and that copies bear this notice and the full cita- representation. Experimental results revealed that i-vector
tion on the first page. Copyrights for components of this work owned by others than level fusion of low-level features can result in more accurate
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission systems for depression recognition.
and/or a fee. Request permissions from [email protected]. Video features, such as body movements and gestures,
AVEC’16, October 16 2016, Amsterdam, Netherlands subtle expressions and periodical muscular movements, have
⃝
c 2016 ACM. ISBN 978-1-4503-4516-3/16/10. . . $15.00 been also widely explored for depression analysis. To de-
DOI: http://dx.doi.org/10.1145/2988257.2988269 scribe the dynamics of facial appearance, AVEC2013 [22]

89
adopted the Local Phase Quantisation (LPQ) features, while criteria, namely, PTSD/Depression Diagnostic, sleep-status,
AVEC2014 adopted the Local Gabor Binary Patterns from feeling and personality have been defined via content anal-
Three Orthogonal Planes (LGBP-TOP) feature as baseline ysis of the participant’s transcripts. The proposed decision
visual features. Girard et al. [8] investigated the relationship tree provides a way of fusing the upper level language infor-
between nonverbal behavior and severity of depression us- mation with the predicted PHQ-8 scores obtained using low
ing Facial Action Coding System (FACS) action units (AUs) level audio and visual features. Experiments are carried out
and head pose. They found that when symptom severi- on the Distress Analysis Interview Corpus - Wizard of Oz
ty was high, participants made fewer affiliative facial ex- (DAIC-WOZ) database [9].
pressions (AU12 and AU15), more non-affiliative facial ex- The remainder of this paper is organized as follows. The
pressions (AU14), and diminished head motion. Head pose audio and video features, as well as the multimodal predic-
analysis was also made in [12] and [2]. In [17], Scherer et tion framework of the PHQ-8 scores, are addressed in sec-
al. proposed the vertical (head and eye) gaze directional- tion 2. Section 3 describes the approach we used for content
ity, smile intensity and average duration, as well as self- analysis of the transcripts to characterize the participants.
adaptors and leg fidgeting, as nonverbal behavior descrip- Using the training set and based on basic summary statis-
tors. Space-temporal interesting point (STIP) features, de- tics of the considered participant’s characteristics and the
scribing the spatio-temporal changes by taking into account PHQ-8 scores, we introduce in Section 4 the proposed deci-
of the movements from facial area, hands, shoulder, and sion tree based depression classification scheme for females
head etc., have been also employed in [12, 6] for depression and males, respectively. Section 5 analyzes the experimental
classification. Typical symptoms of depression can be well results, and finally conclusions are drawn in Section 6.
described by global variation information, therefore most ap-
proaches in depression analysis extract global feature vectors 2. PREDICTION OF THE PHQ-8 SCORES
from the complete video by aggregating a large set of local
descriptors. In [10], Motion History Histogram (MHH), bag 2.1 Audio and Video Features
of words (BOW) and Vector of Local Aggregated Descrip-
For the audio and video features, we make use of the base-
tors (VLAD) have been performed on the LGBP features or
line features provided by AVEC 2016. The baseline audio
STIP features to obtain such global features.
features, consist of 5 formant features and 74 prosodic and
Apart from the audio and visual cues, some researcher-
voice quality features, denoted here after as “covarep” fea-
s analyzed depression from the text/language information.
tures. From the video, based on the OpenFace [4] frame-
In [7], the authors explored the potential to use the social
work, AVEC 2016 provides histogram of oriented gradients
media to detect and diagnose major depressive disorder in
(HOG) features, eye gaze features, and head pose features.
individuals. To characterize the topical language of individ-
In our implementation the eye gaze and head pose features
uals detected positively with depression, the authors built a
have been concatenated into a “Gaze-pose” feature vector.
lexicon of terms that are likely to appear in postings from
AVEC2016, also provides (i) emotion evidence measures for
individuals discussing depression or its symptoms in online
the set {Anger, Contempt, Disgust, Joy, Fear, Neutral, Sad-
settings. Using the frequency of depression terms, an Sup-
ness, Surprise, Confusion, Frustration}. The evidence for
port Vector Machine (SVM) classifier was built to provide
an expression channel is a number (typically between -5 and
estimates of the risk of depression before the reported onset.
+5) that represents the odds, in logarithmic (base 10) scale,
Application of topic and sentiment modeling was presented
of a target expression being present. And (ii) AUs: {AU1,
in [11] for online therapy for depression and anxiety. It was
AU2, AU4, AU5, AU6, AU7, AU9, AU10, AU12, AU14,
found that besides the discussion topic and sentiment, style
AU15, AU17, AU18, AU20, AU23, AU24, AU25, AU26,
and/or dialogue structure is also important for measuring
AU28, AU43} for AUs.
the patient progress. Asgari et al. [3] explored the infor-
Using the provided 68 2D facial landmarks, we follow the
mation from “what is said” (content) and “how it is said”
approach of [20] to extract geometric features. We first cal-
(prosody). To extract features from text, they used a pub-
culate the mean shape of 51 stable points, for female and
lished table to tag each word in an utterance with an arousal
male respectively, using the samples of the training, devel-
and a valence rating. Finally the speech prosody features
opment and test sets. Then, the feature points are aligned
and text features are fused to detect depression by a SVM
with the mean shape, and difference between the coordi-
classifier.
nates of the aligned landmarks and those from the mean
In [19], Stratou et al. showed that gender plays an impor-
shape, and also between the aligned landmark locations in
tant role in the automatic assessment of psychological condi-
the previous and the current frame, are computed, resulting
tions such as depression and PTSD, and a gender dependent
in 204 features. The Euclidean distance between the medi-
approach significantly improves the performance over a gen-
an of the stable landmarks and each aligned landmark in a
der agnostic one.
video frame is also calculated, resulting in 51 features. Fi-
In this paper, we target the Depression Classification Sub-
nally, the facial landmarks are splitted into three groups of
Challenge (DCC) task of AVEC2016 [20], and inspired by [19],
different regions: the left eye and left eyebrow, the right eye
we build classification models for females and males, re-
and right eyebrow, and the mouth. For each of these groups,
spectively. First, a gender-specific multimodal framework,
the Euclidean distances and the angles between the points
combining audio features, visual features as well as AU evi-
are computed, providing 75 features. In total, we obtain 330
dences and emotion evidences, is proposed for the prediction
geometric features.
of PHQ-8 scores. Then, a depression classification decision
As the sizes of the HOG feature vector and geometric
tree is constructed according to the distribution of the mul-
feature vector are high, we apply PCA (with 99.9% of the
timodal predicted PHQ-8 scores and participants’ character-
variance) and obtain the following reduced features sets: 43
istics obtained via the analysis of their transcript files. Four
HOG-PCA features and 43 GEO-PCA features for female,

90
62 HOG-PCA features and 62 GEO-PCA features for male, • PTSD/Depression Diagnosed. The value of this
respectively. criteria is “yes” or “no” according to the characteris-
Finally, for each type of feature, we take its average over tic “depression diagnosed” and “ptsd diagnosed” tran-
the entire screening interview as the global feature of the scriptions.
considered video.
• Feeling. This attribute takes the values “Bad” or
2.2 Multimodal Prediction “Good” following the transcript “feel lately”. The val-
According to [19], female and male has different symptoms ue “Bad” is given when the transcript contain negative
on depression, therefore in our work, the PHQ-8 scores pre- words such as “feeling depressed”, “little depressed”,
diction is done separately for male and female. For each “tired”, “sad”, “depressed blue”, “not okay”, “frustrat-
input feature stream, i.e. formant, covarep, HOG-PCA and ed”, “angry”, “down”, we mark the participant as “feel
GEO-PCA we train a separate Support Vector Regression bad”. If it contains the positive words like “fine”,“good”,
(SVR) model with Radial basis function (RBF) kernel to “pretty good”, “great”, “okay”, or the participant does
predict the PHQ-8 score, the parameters cost and gama of not answer this question, the feeling status will be con-
the SVM were optimised in the range [2−8 – 28 ]. For the sidered as “Good”.
emotion and AU measures we train several SVRs consider-
ing as input feature all the combinations of 1 to 4 evidences, • Personality. This criteria takes the value “Shy” if the
and select the combination producing the lowest root mean transcript contains the words “shy”, “introvert”, “more
square error (RMSE) and mean absolute error (MAE) be- shy” and “probably shy”. If the words like“outgoing”,
tween the predicted and reported PHQ-8 scores, averaged “extrovert”, “mostly outgoing”, are used, we mark the
over all sequences. In our experiment, the (disgust, fear, sad- participant as “outgoing”. If the answer contains “the
ness) emotion evidence combination and the (AU5, AU17, middle”, “a little bit of both”, “depends on the situa-
AU25) AU evidence combination obtained the lowest RMSE tion”, the personality is considered “extrovert”.
and MAE among all the combinations for female. While for
male, (joy, baseline, confusion) and (AU5, AU20, AU25) ob- 4. DECISION TREE BASED DEPRESSION
tain the best prediction performance on the development
set. The output of the 7 unimodal (GEO-PCA, Gaze-pose,
CLASSIFICATION
HOG-PCA, covarep, formant, best emotion evidence combi- The research results of [19] have shown that contribution-
nation, best AU evidence combination) SVRs, are input to s of different behavioral indicators to depression and PTSD
a second level SVR model, or a local linear regression (LLR) are different for males and females. This finding implies that
model, for the final (multimodal) PHQ-8 score predication a decision tree-based classification method maybe improves
as follows. As the AU evidence stream and the emotion ev- the recognition accuracy of depression. Most of the meth-
idence stream provide promising prediction results on the ods that generate decision trees for a specific problem use
training set and development set, we use these two streams examples of data instances in the decision tree generation
as inputs to the second level SVR model and select among process. To this aim we examined the statistics of the above
the other 5 streams (GEO-PCA, Gaze-pose, HOG-PCA, co- defined participants characteristics, which are summarized
varep, and formant) the ones providing the lowest RMSE in the following sections.
and MAE (see Table 10).
4.1 Females
Based on the training set we computed basic summary
3. PARTICIPANT CHARACTERISTICS statistics for each of the defined characteristics:
Apart from the above described features, we conducted
content analysis of the transcripts to characterize the par- • Sleep Status. From Table 1, one can notice that most
ticipants following four criteria: PTSD/Depression Diag- (67.74%) of the not depressed females are marked as
nostic (Yes/No), sleep-status (Normal/Abnormal), Feeling “sleep normal”. While 84.62% of the depressed females
(Bad/Good) and Personality (Shy/Extrovert). The analysis are marked as “sleep abnormal”, among which 61.54%
has been made as follows: are because of the “mind reason”, showing that de-
pressed females think a lot when they sleep.
• Sleep Status. if the participant answers the “easy
sleep” question with positive words such as “no prob-
lem”, “pretty good”, “get a good night’s sleep”, “pretty Table 1: Sleep Status - Females
easy”, “easy”, “I’m ok”, “fairly easy”, etc., or he/she sleep sleep abnormal(%)
does not answer this question, the sleep status is marked classes
normal (%) mind reason other reason
as “normal”. In case the answer contains the words not depressed 21(67.74) 1(3.23) 9(29.03)
such as “not had a good sleep”, “really hard”, “kinda d- depressed 2(15.38) 8(61.54) 3(23.08)
ifficult”, “never easy”, etc., the sleep status is marked as
“abnormal”. Moreover, according to the reason of not
having a good sleep, with such as “disturbing thought”, • PTSD/Depression Diagnosed. Statistics on whether
“mind will be racing a lot”, “thoughts running through the females have been diagnosed with depression or
my mind”, “hard to keep my thoughts”, etc., the sleep PTSD are listed in Table 2, which indicates that al-
status is further marked as “sleep abnormal/mind rea- most all (92.31%) of the depressed females have been
son”. If there is no information about the reason, the diagnosed with either depression or PTSD before, or
sleep status is considered as “sleep abnormal/other rea- even both, while only 25.81% of the not depressed fe-
son”. males have been diagnosed with depression or PTSD.

91
Table 2: PTSD/Depression - Females
no ptsd/ ptsd/
classes
depression(%) depression(%)
not depressed 23(74.19) 8(25.81)
depressed 1(7.69) (no answer) 12(92.31)

Figure 2: Decision Tree for Females

Figure 1: PHQ-8 scores of the “sleep normal” fe-
males
question, 3 have normal sleep, and 4 have abnormal
• Feeling. Table 3 showing that 83.87% of the not sleep. Therefore, we did not use this characteristic for
depressed females feel good, while 69.23% of the de- the final classification.
pressed females feel bad.
• PTSD/Depression Diagnosed. Among the depressed
males, 5 (62.5%) have been diagnosed with depression
Table 3: Feeling - Females or PTSD before, while 3 (37.5%) have not been diag-
classes feel bad (%) feel good (%) nosed before. Therefore we did not use this character-
not depressed 5(16.13) 26(83.87) istic for the final classification.
depressed 9(69.23) 4(30.77) • Feeling. From Table 4. One can notice that 87.5% of
the depressed males feel bad, while only 20% of the not
• PHQ-8 Scores. Figure 1 depicts the predicted and depressed males feel bad. Therefore the “feeling” char-
ground truth PHQ-8 scores of the participants marked acteristics is discriminative to classify depressed/not
as “sleep normal”. As it can be seen, all the 21 not depressed males.
depressed females have a PHQ-8 score lower than 11,
while the PHQ-8 scores of the two depressed females
Table 4: Feeling - Males
are higher or equal to 11.
classes feel bad (%) feel good (%)
not depressed 11(20) 44(80)
From the above statistics, the decision tree based depres-
sion classification for females is given in Figure 2. depressed 7(87.5) 1(12.5)
As shown in Table 1, 84.62% of the depressed females can
not sleep well, therefore the sleep status is firstly checked. • Personality. Statistics on the males personalities are
If it is “sleep normal”, then we check the predicted PHQ-8 listed in Table 5. We should notice here that among
score, if the score is lower than a threshold (10 in our experi- the 26 “shy”, 17 of them explicitly used the words re-
ment to release the influence of inaccuracy in predicting the lated to “shy”, and the other 9 did not answer the ques-
PHQ-8 score), the participant is considered not depressed tion. Therefore, we consider that for the not depressed
(class “0”). If the score is higher than the threshold, we males, their personality of being “shy”, “Extrovert” and
further check if the participant has been already diagnosed “both” is evenly distributed. For the depressed males,
(variable “ptsd/depression diagnosed”). 4(50%) think themselves as being “shy” (2 participants
On the other hand, if the sleep status is “sleep abnormal”, do not answer this question), and no one answered
we further check the reasons. From Table 1, we can see about “Extrovert”.
that 8 depressed females can not sleep well because of mind
reason, therefore, we further test the “sleep reason”, followed
by the “ptsd/depression diagnosed” status and finally the Table 5: Personality - Males
“feeling”. classes shy(%) outgoing(%) both(%)
not depressed 26(47.27) 16(29.09) 13(23.64)
4.2 Males depressed 6(75) 0(0) 2(25)
The statistical analysis of the characteristics variables for
males are summarized here after.
• PHQ-8 Scores. The predicted and ground-truth PHQ-
• Sleep Status. Among the 8 depressed males of the 8 scores of the “shy” (or “both”) males who feel bad
training samples, 1 does not answer the “easy sleep” recently are shown in Figure 3. We can see that when

92
Table 6: Audio/Visual Prediction - Female
Features Dataset RMSE MAE
GEO-PCA(43) Train 5.778 4.705
Dev. 6.387 5.105
Gaze-pose(9) Train 5.800 4.727
Dev. 6.362 5.105
HOG-PCA(43) Train 5.891 4.886
Dev. 6.391 5.158
covarep(74) Train 5.560 4.545
Dev. 6.224 4.842
Figure 3: PHQ-8 Scores of the males formant(5) Train 5.778 4.705
Dev. 6.320 5.000

Table 7: Audio/Visual Prediction - Male

Features Dataset RMSE MAE
GEO-PCA(62) Train 4.832 3.825
Dev. 6.982 5.750
Gaze-pose(9) Train 4.020 2.762
Dev. 6.955 5.750
HOG-PCA(62) Train 4.832 3.825
Dev. 6.982 5.750
covarep(74) Train 4.339 3.111
Dev. 6.910 5.750
formant(5) Train 4.761 3.714
Dev. 6.833 5.563

5.1.2 Prediction from Evidence Features

We combine different AU evidences or emotion evidences
as input features to predict the PHQ-8 scores. In our exper-
Figure 4: Decision Tree for Males iment, we train several SVRs considering as input feature all
the combinations of 1 to 4 evidences, and select the combi-
nation producing the lowest RMSE and MAE between the
the predicted scores are higher than 9, the participants predicted and reported PHQ-8 scores, averaged over all se-
are depressed, while when the scores are lower than 7, quences. Table 8 and Table 9 lists some of the obtained re-
the participants are not depressed. For the partici- sults, for female and male, respectively. One can notice that
pants with ID319 and ID339, the PHQ-8 scores are far for both female and male, and for both emotion evidence
from being accurately predicted. and AU evidence, the combination with 3 evidences obtains
the lowest RMSEs and MAEs. Moreover, in the case of 4
The decision tree of depression classification for males is AU evidences for male, the RMSE (MAE) on the training
as shown in Figure 4. set is 0.756 (0.222), while being 4.191 (3.563) on the de-
velopment set, showing that the SVR model is over-fitting.
Therefore in our experiments, we use the combinations of
5. EXPERIMENTS AND ANALYSIS 3 evidences, indicated in bold in Table 8 and Table 9, to
predict the PHQ-8 scores from emotion evidences and AU
5.1 Prediction of PHQ-8 Scores evidences, respectively.
The root mean square error (RMSE) and mean absolute
error (MAE) averaged over all sequences are being used to 5.1.3 Multimodal Prediction of the PHQ-8 Scores
assess the proposed approach. The best multimodal prediction results of the PHQ-8 s-
cores on the training and development sets are listed in Ta-
5.1.1 Prediction from Audio and Visual Features ble 10. The LLR model has been used for the multimodal
We report in Table 6 and Table 7 the PHQ-8 scores pre- prediction of PHQ-8 scores of females, and SVR for males.
diction accuracy, using single stream input features. The The RMSE and MAE on the test set are also reported. One
dimension of the feature vectors are given between brack- can notice that they are quite high compared to those on
ets. One can see that the audio and visual features obtain the training set and development set.
close performances. For the females, the RMSEs and MAEs
on the training set and development set are very close. For 5.2 Classification Results
the males, the differences of RMSEs and MAEs between the Based on the decision trees of Figure 2 and Figure 4, de-
training set and development set are high, showing that the pression classification experiments are carried out on the de-
SVR models are somehow over-fitting in the training pro- velopment set and the test set of the DAIC-WOZ database,
cess. respectively. The confusion matrix on the development set

93
Table 8: Evidence Based Prediction - Female Table 9: Evidence Based Prediction - Male
Evidence Dataset RMSE MAE Evidence Dataset RMSE MAE
Train 6.094 5.0 Train 4.595 3.206
disgust confusion
Dev. 5.699 4.579 Dev. 6.093 5.25
sadness, Train 5.811 4.182 Train 4.271 2.429
contempt, joy
frustration Dev. 5.161 3.895 Dev. 5.534 4.250
disgust, fear, Train 3.519 1.932 joy, baseline, Train 3.462 1.952
Emotion Emotion
sadness Dev. 4.377 3.368 confusion Dev. 5.466 4.500
anger, joy, Train 4.026 2.432 contempt, joy, Train 4.106 2.127
fear, frustration Dev. 4.894 3.737 sadness, confusion Dev. 5.673 4.563
Train 3.908 2.273 Train 4.483 3.365
All All
Dev. 5.943 4.579 Dev. 6.942 5.688
Train 4.975 3.341 Train 4.595 2.921
AU10 AU23
Dev. 5.201 4.211 Dev. 5.511 4.125
Train 5.379 3.477 Train 3.581 2.095
AU17, AU25 AU4, AU14
Dev. 4.322 3.526 Dev. 4.323 3.188
AU5, AU17, Train 3.879 2.046 AU5, AU20 Train 3.625 1.492
AU AU
AU25 Dev. 3.974 3.263 AU25 Dev. 4.294 3.188
AU9, AU17, Train 4.647 2.955 AU1, AU10, Train 0.756 0.222
AU25, AU28 Dev. 4.383 3.421 AU17, AU18 Dev. 4.191 3.563
Train 5.796 4.818 Train 4.832 3.825
All All
Dev. 6.279 4.895 Dev. 6.982 5.750

Table 10: Multimodal Prediction of PHQ-8 Scores

is shown in Table 11, as it can be seen, most females, as
Gender Features Data RMSE MAE
well as males, have been correctly classified. Among the 35
Female disgust, fear Train 3.286 2.023
participants of the development set, only 2 participants have
(LLR) sadness, AU5, AU17, Dev. 3.770 2.632
not been correctly classified.
AU25, Geo-PCA
The F1 score, precision, and recall for the “depressed”
Male joy, baseline, confusion Train 2.705 1.000
class, and between brackets for the “non depressed” class,
(SVR) AU5, AU20, AU25, Dev. 3.666 2.938
are reported in Table 12. We can see that on the develop-
formant, covarep
ment set, the decision trees obtain very promising results
for both males and females, with the overall F1 score reach- All test 9.106 6.702
ing 0.857 for the “depressed” class, and 0.964 for the “non
depressed” class, which are much higher than the baseline
Table 11: Confusion Matrix on the Development Set
results. On the test set, the F1 score reaches 0.571 for the
Gender Class Depressed Not Depressed
“depressed” class and 0.877 for the “non depressed” class,
Depressed 3 0
with the average 0.724 which is also higher than the base- Female
Not Depressed 1 15
line results. However, the obtained results using the test set
are not so promising, this could be due to the over-fitting of Depressed 3 1
Male
the SVR models, which influences the classification in the Not Depressed 0 12
decision trees. Depressed 6 1
All
Not Depressed 1 27

6. CONCLUSIONS 7. ACKNOWLEDGMENTS
In this paper, with the purpose improving the recognition This work is supported by the National Natural Science
accuracy of the Depression Classification Sub-Challenge (D- Foundation of China (grant 61273265), the Research and
CC) of the AVEC 2016, we proposed a decision tree for Development Program 863 of China (No. 2015AA016402),
depression classification. Two decision trees have been pro- and the VUB Interdisciplinary Research Program through
posed, one for males and one for females. The decision the EMO-App project.
trees have been constructed according to the distribution
of the multimodal prediction of PHQ-8 scores and partici-
pants’ characteristics (PTSD/Depression Diagnostic, sleep- 8. REFERENCES
status, feeling and personality) obtained via the analysis of [1] S. Alghowinem. From joyous to clinically depressed:
the transcript files of the participants. The proposed gender mood detection using multimodal analysis of a
specific decision tree provides a way of fusing the upper lev- person’s appearance and speech. In Affective
el language information with the results obtained using low Computing and Intelligent Interaction (ACII), 2013
level audio and visual features. Humaine Association Conference on, pages 648–654.
In our current implementation we considered a manual de- IEEE, 2013.
cision tree generation process, in future work we planned in- [2] S. Alghowinem, R. Goecke, M. Wagner, G. Parkerx,
vestigating automatic approaches, also other regression ap- and M. Breakspear. Head pose and movement analysis
proaches will be investigated for the PHQ-8 scores. as an indicator of depression. In Affective Computing

94
Table 12: Depression Classification Results
Gender Data F1 Score Precision Recall
Female Dev. 0.857(0.968) 0.750(1.000) 1.000(0.938)
Male Dev. 0.857(0.960) 1.000(0.923) 0.750(1.000)
Dev.(proposed) 0.857(0.964) 0.857(0.964) 0.857(0.964)
Dev.(baseline) 0.58(0.86) 0.47(0.94) 0.78(0.79)
All
test(proposed) 0.571(0.877) 0.500(0.914) 0.667(0.842)
test(baseline) 0.50(0.90) 0.60(0.87) 0.43(0.93)

and Intelligent Interaction (ACII), 2013 Humaine [13] L.-S. A. Low, N. C. Maddage, M. Lech, L. Sheeber,
Association Conference, pages 283–288. IEEE, 2013. and N. Allen. Influence of acoustic low-level
[3] M. Asgari, I. Shafran, and L. B. Sheeber. Inferring descriptors in the detection of clinical depression in
clinical depression from speech and spoken utterances. adolescents. In Acoustics Speech and Signal Processing
In Machine Learning for Signal Processing (MLSP), (ICASSP), 2010 IEEE International Conference on,
2014 IEEE International Workshop on, pages 1–5, pages 5154–5157. IEEE, 2010.
2014. [14] L. S. A. Low, N. C. Maddage, M. Lech, L. B. Sheeber,
[4] T. Baltru, P. Robinson, L.-P. Morency, et al. and N. B. Allen. Detection of clinical depression in
Openface: an open source facial behavior analysis , speech during family interactions. IEEE
adolescentsaŕ
toolkit. In 2016 IEEE Winter Conference on Transactions on Biomedical Engineering,
Applications of Computer Vision (WACV), pages 58(3):574–86, 2011.
1–10. IEEE, 2016. [15] V. Mitra, E. Shriberg, M. McLaren, A. Kathol,
[5] N. Cummins, J. Epps, M. Breakspear, and R. Goecke. C. Richey, D. Vergyri, and M. Graciarena. The SRI
An investigation of depressed speech detection: AVEC-2014 evaluation system. In Proceedings of the
features and normalization. In Interspeech, pages 4th International Workshop on Audio/Visual Emotion
2997–3000, 2011. Challenge, pages 93–101. ACM, 2014.
[6] N. Cummins, J. Joshi, A. Dhall, V. Sethu, R. Goecke, [16] J. C. Mundt, P. J. Snyder, M. S. Cannizzaro,
and J. Epps. Diagnosis of depression by behavioural K. Chappie, and D. S. Geralts. Voice acoustic
signals: a multimodal approach. In Proceedings of the measures of depression severity and treatment
3rd ACM international workshop on Audio/visual response collected via interactive voice response (IVR)
emotion challenge, pages 11–20. ACM, 2013. technology. Journal of Neurolinguistics, 20(1):50–64,
[7] M. Gamon, M. D. Choudhury, S. Counts, and 2007.
E. Horvitz. Predicting depression via social media. In [17] S. Scherer, G. Stratou, M. Mahmoud, J. Boberg,
AAAI, 2013. J. Gratch, R. Albert, and L.-P. Morency. Automatic
[8] J. M. Girard, J. F. Cohn, and M. H. Mahoor. audiovisual behavior descriptors for psychological
Nonverbal social withdrawal in depression: evidence disorder analysis. Image and Vision Computing,
from manual and automatic analyses. Image and 32(10):648–658, 2013.
Vision Computing, 32(10):641–647, 2014. [18] M. Senoussaoui, M. Sarria-Paja, J. F. Santos, and
[9] J. Gratch, R. Artstein, G. Lucas, G. Stratou, T. H. Falk. Model fusion for multimodal depression
S. Scherer, A. Nazarian, R. Wood, B. Boberg, classification and level detection. In Proceedings of the
D. DeVault, S. Marsella, D. Traum, S. Rizzo, and 4th International Workshop on Audio/Visual Emotion
L.-P. Morency. The Distress Analysis Interview Challenge, pages 57–63. ACM, 2014.
Corpus of human and computer interviews. In [19] G. Stratou, S. Scherer, J. Gratch, and L. P. Morency.
Proceedings of Language Resources and Evaluation Automatic nonverbal behavior indicators of depression
Conference (LREC), pages 3123–3128, 2014. and PTSD: the effect of gender. Journal on
[10] L. He, D. Jiang, and H. Sahli. Multimodal depression Multimodal User Interfaces, 9(1):1–13, 2014.
recognition with dynamic visual and audio cues. In [20] M. Valstar, J. Gratch, B. Schuller, F. Ringeval,
Affective Computing and Intelligent Interaction D. Lalanne, M. T. Torres, S. Scherer, G. Stratou,
(ACII), 2015 International Conference on, pages R. Cowie, and M. Pantic. AVEC 2016 - depression,
260–266. AAAC, 2015. mood, and emotion recognition workshop and
[11] C. Howes, M. Purver, R. Mccabe, and R. Mccabe. challenge. In Proceedings of the 6th International
Linguistic indicators of severity and progress in online Workshop on Audio/Visual Emotion Challenge, 2016.
text-based therapy for depression. In ACL Workshop [21] M. Valstar, B. Schuller, K. Smith, T. Almaev,
on Computational Linguistics and Clinical Psychology: F. Eyben, J. Krajewski, R. Cowie, and M. Pantic.
From Linguistic Signal To Clinical Reality, pages AVEC 2014: 3D dimensional affect and depression
7–16, 2014. recognition challenge. In Proceedings of the 4th
[12] J. Joshi, R. Goecke, G. Parker, and M. Breakspear. International Workshop on Audio/Visual Emotion
Can body expressions contribute to automatic Challenge, pages 3–10. ACM, 2014.
depression analysis? In Automatic Face and Gesture [22] M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang,
Recognition (FG), 2013 10th IEEE International S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic.
Conference and Workshops on, pages 1–7. IEEE, 2013. AVEC 2013: the continuous audio/visual emotion and

95
depression recognition challenge. In Proceedings of the biomarkers of depression based on motor
3rd ACM international workshop on audio/visual incoordination. In Proceedings of the 3rd ACM
emotion challenge, pages 3–10. ACM, 2013. international workshop on Audio/visual emotion
[23] J. R. Williamson, T. F. Quatieri, B. S. Helfer, challenge, pages 41–48. ACM, 2013.
R. Horwitz, B. Yu, and D. D. Mehta. Vocal

View publication stats

ZHS Esm
No ratings yet
ZHS Esm
8 pages
Math7 Unpacking
No ratings yet
Math7 Unpacking
1 page
Yang Et Al.2019
No ratings yet
Yang Et Al.2019
20 pages
A Secure Cloud Backup System With Assured Deletion
No ratings yet
A Secure Cloud Backup System With Assured Deletion
9 pages
Frontiernagieria Spec Causality
No ratings yet
Frontiernagieria Spec Causality
19 pages
HardAttentionNetforRetinaVesselSegmentation Final
No ratings yet
HardAttentionNetforRetinaVesselSegmentation Final
13 pages
2016 IJBSEM Parrotta Youn Camacho2016
No ratings yet
2016 IJBSEM Parrotta Youn Camacho2016
6 pages
Admt 201600177
No ratings yet
Admt 201600177
8 pages
1 s2.0 S1110016824001182 Main
No ratings yet
1 s2.0 S1110016824001182 Main
17 pages
Osteoarthritis Update
No ratings yet
Osteoarthritis Update
13 pages
2020 Schmidt SciRep Kambo
No ratings yet
2020 Schmidt SciRep Kambo
12 pages
Spop Erg
No ratings yet
Spop Erg
16 pages
Xantum Gum
No ratings yet
Xantum Gum
9 pages
BCG Vaccines Their Mechanisms of Attenuation and I PDF
No ratings yet
BCG Vaccines Their Mechanisms of Attenuation and I PDF
10 pages
The Roadmap To 6G - AI Empowered Wireless Networks: April 2019
No ratings yet
The Roadmap To 6G - AI Empowered Wireless Networks: April 2019
8 pages
Noninvasive Monitoring of Traumatic Brain Injury and Post-Traumaticrehabilitation With Laser-Induced Photoacoustic Imaging
No ratings yet
Noninvasive Monitoring of Traumatic Brain Injury and Post-Traumaticrehabilitation With Laser-Induced Photoacoustic Imaging
4 pages
Medical Professionalism in Society - Matthew K Wynia
No ratings yet
Medical Professionalism in Society - Matthew K Wynia
6 pages
IBCCongress 2024
No ratings yet
IBCCongress 2024
3 pages
Learning Acousic
No ratings yet
Learning Acousic
12 pages
CA2789256Linfordetal Patent
No ratings yet
CA2789256Linfordetal Patent
61 pages
Towards Implementation of IEC 61850 GOOSE Messaging in IEC 61499 Environment
No ratings yet
Towards Implementation of IEC 61850 GOOSE Messaging in IEC 61499 Environment
5 pages
Research Progress On Environmental-Friendly Insulating Gases For HVDC Gas-Insulated Transmission Lines
No ratings yet
Research Progress On Environmental-Friendly Insulating Gases For HVDC Gas-Insulated Transmission Lines
22 pages
EC50 Value
No ratings yet
EC50 Value
8 pages
Amphibians and Reptiles Matang Wildlife Centre
No ratings yet
Amphibians and Reptiles Matang Wildlife Centre
8 pages
Interspecies Interactions Within Oral Microbial Co
No ratings yet
Interspecies Interactions Within Oral Microbial Co
19 pages
TrafficF PDF
No ratings yet
TrafficF PDF
2 pages
Facile Hydrothermal Synthesis and Photocatalytic Activity of Rod-Like Nanosized Silver Tungstate
No ratings yet
Facile Hydrothermal Synthesis and Photocatalytic Activity of Rod-Like Nanosized Silver Tungstate
5 pages
RepressedMemory Review
No ratings yet
RepressedMemory Review
4 pages
1 s2.0 S0267726125000764 Main
No ratings yet
1 s2.0 S0267726125000764 Main
12 pages
Extraction of Fingerprint From Regular Expression
No ratings yet
Extraction of Fingerprint From Regular Expression
7 pages
CPB 1
No ratings yet
CPB 1
9 pages
Building Practical High Voltage Cathode Materials For Lithium Ion Batteries (Adv. Mater. 52/2022)
No ratings yet
Building Practical High Voltage Cathode Materials For Lithium Ion Batteries (Adv. Mater. 52/2022)
19 pages
Bim For Waste MGT
No ratings yet
Bim For Waste MGT
24 pages
Security in Mobile Ad Hoc Networks: Challenges and Solutions
No ratings yet
Security in Mobile Ad Hoc Networks: Challenges and Solutions
11 pages
An Overview of The Literature On Sexual Harassment
100% (1)
An Overview of The Literature On Sexual Harassment
14 pages
KuhfuMaldeiHetmanekBaumann 2021 Somaticexperiencing Ascopingliteraturereview Finale Publikation
No ratings yet
KuhfuMaldeiHetmanekBaumann 2021 Somaticexperiencing Ascopingliteraturereview Finale Publikation
19 pages
2 PDF
No ratings yet
2 PDF
12 pages
Soil Treatment Using Microbial Biopolymers For Anti Desertification Purposes
No ratings yet
Soil Treatment Using Microbial Biopolymers For Anti Desertification Purposes
10 pages
Image-Guided Radiation Therapy
No ratings yet
Image-Guided Radiation Therapy
32 pages
Jtepbs 0000113
No ratings yet
Jtepbs 0000113
13 pages
Fpsyg 1 1542158
No ratings yet
Fpsyg 1 1542158
4 pages
Linfordetal patentUS9283541
No ratings yet
Linfordetal patentUS9283541
42 pages
How To Analyse and Represent Quantitative Soundscape Data: March 2022
No ratings yet
How To Analyse and Represent Quantitative Soundscape Data: March 2022
9 pages
Implications of Future Price Trends and Interannual Resource Uncertainty On Firm Solar Power Delivery With Photovoltaic Overbuilding and Battery Storage-3
No ratings yet
Implications of Future Price Trends and Interannual Resource Uncertainty On Firm Solar Power Delivery With Photovoltaic Overbuilding and Battery Storage-3
14 pages
Virus-Likeparticles MolecularSciences
No ratings yet
Virus-Likeparticles MolecularSciences
25 pages
1 s2.0 S1876380417301143 Main
No ratings yet
1 s2.0 S1876380417301143 Main
7 pages
Understanding Food Systems Drivers: A Critical Review of The Literature
No ratings yet
Understanding Food Systems Drivers: A Critical Review of The Literature
12 pages
Vehicle Logo Recognition by Spatial-SIFT Combined With Logistic Regression
No ratings yet
Vehicle Logo Recognition by Spatial-SIFT Combined With Logistic Regression
9 pages
Neuroimmune Transcriptome Changes in Patient Brains of Psychiatric and Neurological Disorders
No ratings yet
Neuroimmune Transcriptome Changes in Patient Brains of Psychiatric and Neurological Disorders
13 pages
Finalversion
No ratings yet
Finalversion
5 pages
Advance in Ladle Shroud
No ratings yet
Advance in Ladle Shroud
12 pages
Ir Ox 2
No ratings yet
Ir Ox 2
11 pages
Effect of Water Content On The Strength of Bio-Cemented Sand in Various Drying Process
No ratings yet
Effect of Water Content On The Strength of Bio-Cemented Sand in Various Drying Process
14 pages
Optimization of Observation Time For Obtaining The
No ratings yet
Optimization of Observation Time For Obtaining The
5 pages
Colston 2019
No ratings yet
Colston 2019
21 pages
Materials 14 00147 1
No ratings yet
Materials 14 00147 1
13 pages
Ullah 2017
No ratings yet
Ullah 2017
13 pages
Control Strategy For Pulsed Lead Acid Battery Charger For Stand Alone Photovoltaics
No ratings yet
Control Strategy For Pulsed Lead Acid Battery Charger For Stand Alone Photovoltaics
7 pages
Clinical Validation of The Non-Invasive Cardiac Output Monitor USCOM-1A in Critically Ill Patients
No ratings yet
Clinical Validation of The Non-Invasive Cardiac Output Monitor USCOM-1A in Critically Ill Patients
10 pages
10 1 1 627 5898 PDF
No ratings yet
10 1 1 627 5898 PDF
15 pages
Consolidated 1st Quarterly Test Results
No ratings yet
Consolidated 1st Quarterly Test Results
39 pages
Quick Reference: SAS Programming 1: Essentials
No ratings yet
Quick Reference: SAS Programming 1: Essentials
10 pages
ProblemsChapter 05 Cables PDF
No ratings yet
ProblemsChapter 05 Cables PDF
4 pages
Circuits Lab Exp 4 Report
No ratings yet
Circuits Lab Exp 4 Report
15 pages
Exercise Solutions of Java Functions
No ratings yet
Exercise Solutions of Java Functions
2 pages
Cp5293 Big Data Analytics 1
No ratings yet
Cp5293 Big Data Analytics 1
9 pages
Lesson 13 Aviation Math
No ratings yet
Lesson 13 Aviation Math
31 pages
MM Count and Crunch
No ratings yet
MM Count and Crunch
4 pages
Addition and Subtraction For First Grade Lesson Plan With Rubic 1
No ratings yet
Addition and Subtraction For First Grade Lesson Plan With Rubic 1
3 pages
Project Final1
No ratings yet
Project Final1
39 pages
AREA UNDER CURVE JEE MAIN Previous Year Q Bank Till 2017
No ratings yet
AREA UNDER CURVE JEE MAIN Previous Year Q Bank Till 2017
5 pages
Rutishauser Eigen 29 Matrix Order
No ratings yet
Rutishauser Eigen 29 Matrix Order
17 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
14 pages
Computer Graphics: BY N.Sathish Kumar AP CSE
No ratings yet
Computer Graphics: BY N.Sathish Kumar AP CSE
24 pages
Analytical Representation of Switching Current Imp
No ratings yet
Analytical Representation of Switching Current Imp
6 pages
Form 1325
No ratings yet
Form 1325
7 pages
Cve 342 - 3
No ratings yet
Cve 342 - 3
10 pages
Maths Notes
No ratings yet
Maths Notes
27 pages
Teori Idw Dari Arcgis
No ratings yet
Teori Idw Dari Arcgis
2 pages
AV Mathematics - III For EC-BM Engineering BMATEC301-BEC-BBM301 Dec2023-Jan2024
No ratings yet
AV Mathematics - III For EC-BM Engineering BMATEC301-BEC-BBM301 Dec2023-Jan2024
3 pages
LAB3
No ratings yet
LAB3
17 pages
Level 2 Elementary Myperfectice
No ratings yet
Level 2 Elementary Myperfectice
101 pages
PMOS, NMOS and CMOS Transmission Gate Characteristics.
No ratings yet
PMOS, NMOS and CMOS Transmission Gate Characteristics.
13 pages
Eee 2204 Engineering Mathematics IV
No ratings yet
Eee 2204 Engineering Mathematics IV
3 pages
Piet Mondrian Lesson
No ratings yet
Piet Mondrian Lesson
3 pages
Primary 2
100% (1)
Primary 2
11 pages
Investigating The Role of ChatGPT in Supporting Metacognitive Processes During Problem Solving Activities
No ratings yet
Investigating The Role of ChatGPT in Supporting Metacognitive Processes During Problem Solving Activities
25 pages

Decision Tree Based Depression Classicationfrom Audio Videoand Language Information

Uploaded by

Decision Tree Based Depression Classicationfrom Audio Videoand Language Information

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Article · November 2016

SEE PROFILE SEE PROFILE

Ercheng Pei Meshia Cédric Oveneke

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Le Yang Dongmei Jiang Lang He

Figure 2: Decision Tree for Females

Table 7: Audio/Visual Prediction - Male

5.1.2 Prediction from Evidence Features

Table 10: Multimodal Prediction of PHQ-8 Scores

View publication stats

You might also like