Decision Tree Based Depression Classicationfrom Audio Videoand Language Information
Decision Tree Based Depression Classicationfrom Audio Videoand Language Information
net/publication/309380604
Decision Tree Based Depression Classification from Audio Video and Language
Information
CITATIONS READS
90 2,356
6 authors, including:
Le Yang Lang he
Northwestern Polytechnical University Xi'an University of Posts and Telecommunications
18 PUBLICATIONS 701 CITATIONS 20 PUBLICATIONS 673 CITATIONS
All content following this page was uploaded by Le Yang on 27 June 2018.
ABSTRACT Keywords
In order to improve the recognition accuracy of the Depres- Depression classification, decision tree, multi-modal
sion Classification Sub-Challenge (DCC) of the AVEC 2016,
in this paper we propose a decision tree for depression clas-
sification. The decision tree is constructed according to the
1. INTRODUCTION
distribution of the multimodal prediction of PHQ-8 scores Nowadays, depression and anxiety disorders are highly
and participants’ characteristics (PTSD/Depression Diag- prevalent worldwide causing burden and disability for indi-
nostic, sleep-status, feeling and personality) obtained via viduals, families and society. According to the World Health
the analysis of the transcript files of the participants. The Organization (WHO), depression will be the fourth mental
proposed gender specific decision tree provides a way of fus- disorder in 2020. The affective computing community has
ing the upper level language information with the results shown a growing interest in developing various systems, us-
obtained using low level audio and visual features. Exper- ing audio and video features, to assist psychologists in the
iments are carried out on the Distress Analysis Interview prevention and treatment of clinical depression.
Corpus - Wizard of Oz (DAIC-WOZ) database, results show Considering the audio features, researchers have found
that the proposed depression classification schemes obtain that depressed subjects are prone to possess a low dynamic
very promising results on the development set, with F1 s- range of the fundamental frequency, a slow speaking rate,
core reaching 0.857 for class depressed and 0.964 for class a slightly shorter speaking duration, and a relatively mono-
not depressed. Despite of the over-fitting problem in train- tone delivery [13], [5], [23], [16], [1]. Moreover, com-
ing the models of predicting the PHQ-8 scores, the classi- pared with health controls, the Harmonic-to-Noise Ratio
fication schemes still obtain satisfying performance on the (HNR) values of depressed subjects are higher [14]. Con-
test set. The F1 score reaches 0.571 for class depressed and sequently, researchers formulated subtle changes in speech
0.877 for class not depressed, with the average 0.724 which characteristics (e.g., differences in pitch, loudness, speaking
is higher than the baseline result 0.700. rate, articulation, etc.) as indicators of depression. Low-
level descriptors (LLDs) (such as energy, spectrum, and Mel
frequency cepstrum coefficients-MFCC) based features were
used as baseline audio features in the Audio Visual Emo-
tion Challenge and Workshops (AVEC2013 and AVEC2014)
[22], [21]. In [18] and [15], an i-vector based representation
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed was computed to convert the frame-level features to a global
for profit or commercial advantage and that copies bear this notice and the full cita- representation. Experimental results revealed that i-vector
tion on the first page. Copyrights for components of this work owned by others than level fusion of low-level features can result in more accurate
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission systems for depression recognition.
and/or a fee. Request permissions from [email protected]. Video features, such as body movements and gestures,
AVEC’16, October 16 2016, Amsterdam, Netherlands subtle expressions and periodical muscular movements, have
⃝
c 2016 ACM. ISBN 978-1-4503-4516-3/16/10. . . $15.00 been also widely explored for depression analysis. To de-
DOI: http://dx.doi.org/10.1145/2988257.2988269 scribe the dynamics of facial appearance, AVEC2013 [22]
89
adopted the Local Phase Quantisation (LPQ) features, while criteria, namely, PTSD/Depression Diagnostic, sleep-status,
AVEC2014 adopted the Local Gabor Binary Patterns from feeling and personality have been defined via content anal-
Three Orthogonal Planes (LGBP-TOP) feature as baseline ysis of the participant’s transcripts. The proposed decision
visual features. Girard et al. [8] investigated the relationship tree provides a way of fusing the upper level language infor-
between nonverbal behavior and severity of depression us- mation with the predicted PHQ-8 scores obtained using low
ing Facial Action Coding System (FACS) action units (AUs) level audio and visual features. Experiments are carried out
and head pose. They found that when symptom severi- on the Distress Analysis Interview Corpus - Wizard of Oz
ty was high, participants made fewer affiliative facial ex- (DAIC-WOZ) database [9].
pressions (AU12 and AU15), more non-affiliative facial ex- The remainder of this paper is organized as follows. The
pressions (AU14), and diminished head motion. Head pose audio and video features, as well as the multimodal predic-
analysis was also made in [12] and [2]. In [17], Scherer et tion framework of the PHQ-8 scores, are addressed in sec-
al. proposed the vertical (head and eye) gaze directional- tion 2. Section 3 describes the approach we used for content
ity, smile intensity and average duration, as well as self- analysis of the transcripts to characterize the participants.
adaptors and leg fidgeting, as nonverbal behavior descrip- Using the training set and based on basic summary statis-
tors. Space-temporal interesting point (STIP) features, de- tics of the considered participant’s characteristics and the
scribing the spatio-temporal changes by taking into account PHQ-8 scores, we introduce in Section 4 the proposed deci-
of the movements from facial area, hands, shoulder, and sion tree based depression classification scheme for females
head etc., have been also employed in [12, 6] for depression and males, respectively. Section 5 analyzes the experimental
classification. Typical symptoms of depression can be well results, and finally conclusions are drawn in Section 6.
described by global variation information, therefore most ap-
proaches in depression analysis extract global feature vectors 2. PREDICTION OF THE PHQ-8 SCORES
from the complete video by aggregating a large set of local
descriptors. In [10], Motion History Histogram (MHH), bag 2.1 Audio and Video Features
of words (BOW) and Vector of Local Aggregated Descrip-
For the audio and video features, we make use of the base-
tors (VLAD) have been performed on the LGBP features or
line features provided by AVEC 2016. The baseline audio
STIP features to obtain such global features.
features, consist of 5 formant features and 74 prosodic and
Apart from the audio and visual cues, some researcher-
voice quality features, denoted here after as “covarep” fea-
s analyzed depression from the text/language information.
tures. From the video, based on the OpenFace [4] frame-
In [7], the authors explored the potential to use the social
work, AVEC 2016 provides histogram of oriented gradients
media to detect and diagnose major depressive disorder in
(HOG) features, eye gaze features, and head pose features.
individuals. To characterize the topical language of individ-
In our implementation the eye gaze and head pose features
uals detected positively with depression, the authors built a
have been concatenated into a “Gaze-pose” feature vector.
lexicon of terms that are likely to appear in postings from
AVEC2016, also provides (i) emotion evidence measures for
individuals discussing depression or its symptoms in online
the set {Anger, Contempt, Disgust, Joy, Fear, Neutral, Sad-
settings. Using the frequency of depression terms, an Sup-
ness, Surprise, Confusion, Frustration}. The evidence for
port Vector Machine (SVM) classifier was built to provide
an expression channel is a number (typically between -5 and
estimates of the risk of depression before the reported onset.
+5) that represents the odds, in logarithmic (base 10) scale,
Application of topic and sentiment modeling was presented
of a target expression being present. And (ii) AUs: {AU1,
in [11] for online therapy for depression and anxiety. It was
AU2, AU4, AU5, AU6, AU7, AU9, AU10, AU12, AU14,
found that besides the discussion topic and sentiment, style
AU15, AU17, AU18, AU20, AU23, AU24, AU25, AU26,
and/or dialogue structure is also important for measuring
AU28, AU43} for AUs.
the patient progress. Asgari et al. [3] explored the infor-
Using the provided 68 2D facial landmarks, we follow the
mation from “what is said” (content) and “how it is said”
approach of [20] to extract geometric features. We first cal-
(prosody). To extract features from text, they used a pub-
culate the mean shape of 51 stable points, for female and
lished table to tag each word in an utterance with an arousal
male respectively, using the samples of the training, devel-
and a valence rating. Finally the speech prosody features
opment and test sets. Then, the feature points are aligned
and text features are fused to detect depression by a SVM
with the mean shape, and difference between the coordi-
classifier.
nates of the aligned landmarks and those from the mean
In [19], Stratou et al. showed that gender plays an impor-
shape, and also between the aligned landmark locations in
tant role in the automatic assessment of psychological condi-
the previous and the current frame, are computed, resulting
tions such as depression and PTSD, and a gender dependent
in 204 features. The Euclidean distance between the medi-
approach significantly improves the performance over a gen-
an of the stable landmarks and each aligned landmark in a
der agnostic one.
video frame is also calculated, resulting in 51 features. Fi-
In this paper, we target the Depression Classification Sub-
nally, the facial landmarks are splitted into three groups of
Challenge (DCC) task of AVEC2016 [20], and inspired by [19],
different regions: the left eye and left eyebrow, the right eye
we build classification models for females and males, re-
and right eyebrow, and the mouth. For each of these groups,
spectively. First, a gender-specific multimodal framework,
the Euclidean distances and the angles between the points
combining audio features, visual features as well as AU evi-
are computed, providing 75 features. In total, we obtain 330
dences and emotion evidences, is proposed for the prediction
geometric features.
of PHQ-8 scores. Then, a depression classification decision
As the sizes of the HOG feature vector and geometric
tree is constructed according to the distribution of the mul-
feature vector are high, we apply PCA (with 99.9% of the
timodal predicted PHQ-8 scores and participants’ character-
variance) and obtain the following reduced features sets: 43
istics obtained via the analysis of their transcript files. Four
HOG-PCA features and 43 GEO-PCA features for female,
90
62 HOG-PCA features and 62 GEO-PCA features for male, • PTSD/Depression Diagnosed. The value of this
respectively. criteria is “yes” or “no” according to the characteris-
Finally, for each type of feature, we take its average over tic “depression diagnosed” and “ptsd diagnosed” tran-
the entire screening interview as the global feature of the scriptions.
considered video.
• Feeling. This attribute takes the values “Bad” or
2.2 Multimodal Prediction “Good” following the transcript “feel lately”. The val-
According to [19], female and male has different symptoms ue “Bad” is given when the transcript contain negative
on depression, therefore in our work, the PHQ-8 scores pre- words such as “feeling depressed”, “little depressed”,
diction is done separately for male and female. For each “tired”, “sad”, “depressed blue”, “not okay”, “frustrat-
input feature stream, i.e. formant, covarep, HOG-PCA and ed”, “angry”, “down”, we mark the participant as “feel
GEO-PCA we train a separate Support Vector Regression bad”. If it contains the positive words like “fine”,“good”,
(SVR) model with Radial basis function (RBF) kernel to “pretty good”, “great”, “okay”, or the participant does
predict the PHQ-8 score, the parameters cost and gama of not answer this question, the feeling status will be con-
the SVM were optimised in the range [2−8 – 28 ]. For the sidered as “Good”.
emotion and AU measures we train several SVRs consider-
ing as input feature all the combinations of 1 to 4 evidences, • Personality. This criteria takes the value “Shy” if the
and select the combination producing the lowest root mean transcript contains the words “shy”, “introvert”, “more
square error (RMSE) and mean absolute error (MAE) be- shy” and “probably shy”. If the words like“outgoing”,
tween the predicted and reported PHQ-8 scores, averaged “extrovert”, “mostly outgoing”, are used, we mark the
over all sequences. In our experiment, the (disgust, fear, sad- participant as “outgoing”. If the answer contains “the
ness) emotion evidence combination and the (AU5, AU17, middle”, “a little bit of both”, “depends on the situa-
AU25) AU evidence combination obtained the lowest RMSE tion”, the personality is considered “extrovert”.
and MAE among all the combinations for female. While for
male, (joy, baseline, confusion) and (AU5, AU20, AU25) ob- 4. DECISION TREE BASED DEPRESSION
tain the best prediction performance on the development
set. The output of the 7 unimodal (GEO-PCA, Gaze-pose,
CLASSIFICATION
HOG-PCA, covarep, formant, best emotion evidence combi- The research results of [19] have shown that contribution-
nation, best AU evidence combination) SVRs, are input to s of different behavioral indicators to depression and PTSD
a second level SVR model, or a local linear regression (LLR) are different for males and females. This finding implies that
model, for the final (multimodal) PHQ-8 score predication a decision tree-based classification method maybe improves
as follows. As the AU evidence stream and the emotion ev- the recognition accuracy of depression. Most of the meth-
idence stream provide promising prediction results on the ods that generate decision trees for a specific problem use
training set and development set, we use these two streams examples of data instances in the decision tree generation
as inputs to the second level SVR model and select among process. To this aim we examined the statistics of the above
the other 5 streams (GEO-PCA, Gaze-pose, HOG-PCA, co- defined participants characteristics, which are summarized
varep, and formant) the ones providing the lowest RMSE in the following sections.
and MAE (see Table 10).
4.1 Females
Based on the training set we computed basic summary
3. PARTICIPANT CHARACTERISTICS statistics for each of the defined characteristics:
Apart from the above described features, we conducted
content analysis of the transcripts to characterize the par- • Sleep Status. From Table 1, one can notice that most
ticipants following four criteria: PTSD/Depression Diag- (67.74%) of the not depressed females are marked as
nostic (Yes/No), sleep-status (Normal/Abnormal), Feeling “sleep normal”. While 84.62% of the depressed females
(Bad/Good) and Personality (Shy/Extrovert). The analysis are marked as “sleep abnormal”, among which 61.54%
has been made as follows: are because of the “mind reason”, showing that de-
pressed females think a lot when they sleep.
• Sleep Status. if the participant answers the “easy
sleep” question with positive words such as “no prob-
lem”, “pretty good”, “get a good night’s sleep”, “pretty Table 1: Sleep Status - Females
easy”, “easy”, “I’m ok”, “fairly easy”, etc., or he/she sleep sleep abnormal(%)
does not answer this question, the sleep status is marked classes
normal (%) mind reason other reason
as “normal”. In case the answer contains the words not depressed 21(67.74) 1(3.23) 9(29.03)
such as “not had a good sleep”, “really hard”, “kinda d- depressed 2(15.38) 8(61.54) 3(23.08)
ifficult”, “never easy”, etc., the sleep status is marked as
“abnormal”. Moreover, according to the reason of not
having a good sleep, with such as “disturbing thought”, • PTSD/Depression Diagnosed. Statistics on whether
“mind will be racing a lot”, “thoughts running through the females have been diagnosed with depression or
my mind”, “hard to keep my thoughts”, etc., the sleep PTSD are listed in Table 2, which indicates that al-
status is further marked as “sleep abnormal/mind rea- most all (92.31%) of the depressed females have been
son”. If there is no information about the reason, the diagnosed with either depression or PTSD before, or
sleep status is considered as “sleep abnormal/other rea- even both, while only 25.81% of the not depressed fe-
son”. males have been diagnosed with depression or PTSD.
91
Table 2: PTSD/Depression - Females
no ptsd/ ptsd/
classes
depression(%) depression(%)
not depressed 23(74.19) 8(25.81)
depressed 1(7.69) (no answer) 12(92.31)
92
Table 6: Audio/Visual Prediction - Female
Features Dataset RMSE MAE
GEO-PCA(43) Train 5.778 4.705
Dev. 6.387 5.105
Gaze-pose(9) Train 5.800 4.727
Dev. 6.362 5.105
HOG-PCA(43) Train 5.891 4.886
Dev. 6.391 5.158
covarep(74) Train 5.560 4.545
Dev. 6.224 4.842
Figure 3: PHQ-8 Scores of the males formant(5) Train 5.778 4.705
Dev. 6.320 5.000
93
Table 8: Evidence Based Prediction - Female Table 9: Evidence Based Prediction - Male
Evidence Dataset RMSE MAE Evidence Dataset RMSE MAE
Train 6.094 5.0 Train 4.595 3.206
disgust confusion
Dev. 5.699 4.579 Dev. 6.093 5.25
sadness, Train 5.811 4.182 Train 4.271 2.429
contempt, joy
frustration Dev. 5.161 3.895 Dev. 5.534 4.250
disgust, fear, Train 3.519 1.932 joy, baseline, Train 3.462 1.952
Emotion Emotion
sadness Dev. 4.377 3.368 confusion Dev. 5.466 4.500
anger, joy, Train 4.026 2.432 contempt, joy, Train 4.106 2.127
fear, frustration Dev. 4.894 3.737 sadness, confusion Dev. 5.673 4.563
Train 3.908 2.273 Train 4.483 3.365
All All
Dev. 5.943 4.579 Dev. 6.942 5.688
Train 4.975 3.341 Train 4.595 2.921
AU10 AU23
Dev. 5.201 4.211 Dev. 5.511 4.125
Train 5.379 3.477 Train 3.581 2.095
AU17, AU25 AU4, AU14
Dev. 4.322 3.526 Dev. 4.323 3.188
AU5, AU17, Train 3.879 2.046 AU5, AU20 Train 3.625 1.492
AU AU
AU25 Dev. 3.974 3.263 AU25 Dev. 4.294 3.188
AU9, AU17, Train 4.647 2.955 AU1, AU10, Train 0.756 0.222
AU25, AU28 Dev. 4.383 3.421 AU17, AU18 Dev. 4.191 3.563
Train 5.796 4.818 Train 4.832 3.825
All All
Dev. 6.279 4.895 Dev. 6.982 5.750
6. CONCLUSIONS 7. ACKNOWLEDGMENTS
In this paper, with the purpose improving the recognition This work is supported by the National Natural Science
accuracy of the Depression Classification Sub-Challenge (D- Foundation of China (grant 61273265), the Research and
CC) of the AVEC 2016, we proposed a decision tree for Development Program 863 of China (No. 2015AA016402),
depression classification. Two decision trees have been pro- and the VUB Interdisciplinary Research Program through
posed, one for males and one for females. The decision the EMO-App project.
trees have been constructed according to the distribution
of the multimodal prediction of PHQ-8 scores and partici-
pants’ characteristics (PTSD/Depression Diagnostic, sleep- 8. REFERENCES
status, feeling and personality) obtained via the analysis of [1] S. Alghowinem. From joyous to clinically depressed:
the transcript files of the participants. The proposed gender mood detection using multimodal analysis of a
specific decision tree provides a way of fusing the upper lev- person’s appearance and speech. In Affective
el language information with the results obtained using low Computing and Intelligent Interaction (ACII), 2013
level audio and visual features. Humaine Association Conference on, pages 648–654.
In our current implementation we considered a manual de- IEEE, 2013.
cision tree generation process, in future work we planned in- [2] S. Alghowinem, R. Goecke, M. Wagner, G. Parkerx,
vestigating automatic approaches, also other regression ap- and M. Breakspear. Head pose and movement analysis
proaches will be investigated for the PHQ-8 scores. as an indicator of depression. In Affective Computing
94
Table 12: Depression Classification Results
Gender Data F1 Score Precision Recall
Female Dev. 0.857(0.968) 0.750(1.000) 1.000(0.938)
Male Dev. 0.857(0.960) 1.000(0.923) 0.750(1.000)
Dev.(proposed) 0.857(0.964) 0.857(0.964) 0.857(0.964)
Dev.(baseline) 0.58(0.86) 0.47(0.94) 0.78(0.79)
All
test(proposed) 0.571(0.877) 0.500(0.914) 0.667(0.842)
test(baseline) 0.50(0.90) 0.60(0.87) 0.43(0.93)
and Intelligent Interaction (ACII), 2013 Humaine [13] L.-S. A. Low, N. C. Maddage, M. Lech, L. Sheeber,
Association Conference, pages 283–288. IEEE, 2013. and N. Allen. Influence of acoustic low-level
[3] M. Asgari, I. Shafran, and L. B. Sheeber. Inferring descriptors in the detection of clinical depression in
clinical depression from speech and spoken utterances. adolescents. In Acoustics Speech and Signal Processing
In Machine Learning for Signal Processing (MLSP), (ICASSP), 2010 IEEE International Conference on,
2014 IEEE International Workshop on, pages 1–5, pages 5154–5157. IEEE, 2010.
2014. [14] L. S. A. Low, N. C. Maddage, M. Lech, L. B. Sheeber,
[4] T. Baltru, P. Robinson, L.-P. Morency, et al. and N. B. Allen. Detection of clinical depression in
Openface: an open source facial behavior analysis , speech during family interactions. IEEE
adolescentsaŕ
toolkit. In 2016 IEEE Winter Conference on Transactions on Biomedical Engineering,
Applications of Computer Vision (WACV), pages 58(3):574–86, 2011.
1–10. IEEE, 2016. [15] V. Mitra, E. Shriberg, M. McLaren, A. Kathol,
[5] N. Cummins, J. Epps, M. Breakspear, and R. Goecke. C. Richey, D. Vergyri, and M. Graciarena. The SRI
An investigation of depressed speech detection: AVEC-2014 evaluation system. In Proceedings of the
features and normalization. In Interspeech, pages 4th International Workshop on Audio/Visual Emotion
2997–3000, 2011. Challenge, pages 93–101. ACM, 2014.
[6] N. Cummins, J. Joshi, A. Dhall, V. Sethu, R. Goecke, [16] J. C. Mundt, P. J. Snyder, M. S. Cannizzaro,
and J. Epps. Diagnosis of depression by behavioural K. Chappie, and D. S. Geralts. Voice acoustic
signals: a multimodal approach. In Proceedings of the measures of depression severity and treatment
3rd ACM international workshop on Audio/visual response collected via interactive voice response (IVR)
emotion challenge, pages 11–20. ACM, 2013. technology. Journal of Neurolinguistics, 20(1):50–64,
[7] M. Gamon, M. D. Choudhury, S. Counts, and 2007.
E. Horvitz. Predicting depression via social media. In [17] S. Scherer, G. Stratou, M. Mahmoud, J. Boberg,
AAAI, 2013. J. Gratch, R. Albert, and L.-P. Morency. Automatic
[8] J. M. Girard, J. F. Cohn, and M. H. Mahoor. audiovisual behavior descriptors for psychological
Nonverbal social withdrawal in depression: evidence disorder analysis. Image and Vision Computing,
from manual and automatic analyses. Image and 32(10):648–658, 2013.
Vision Computing, 32(10):641–647, 2014. [18] M. Senoussaoui, M. Sarria-Paja, J. F. Santos, and
[9] J. Gratch, R. Artstein, G. Lucas, G. Stratou, T. H. Falk. Model fusion for multimodal depression
S. Scherer, A. Nazarian, R. Wood, B. Boberg, classification and level detection. In Proceedings of the
D. DeVault, S. Marsella, D. Traum, S. Rizzo, and 4th International Workshop on Audio/Visual Emotion
L.-P. Morency. The Distress Analysis Interview Challenge, pages 57–63. ACM, 2014.
Corpus of human and computer interviews. In [19] G. Stratou, S. Scherer, J. Gratch, and L. P. Morency.
Proceedings of Language Resources and Evaluation Automatic nonverbal behavior indicators of depression
Conference (LREC), pages 3123–3128, 2014. and PTSD: the effect of gender. Journal on
[10] L. He, D. Jiang, and H. Sahli. Multimodal depression Multimodal User Interfaces, 9(1):1–13, 2014.
recognition with dynamic visual and audio cues. In [20] M. Valstar, J. Gratch, B. Schuller, F. Ringeval,
Affective Computing and Intelligent Interaction D. Lalanne, M. T. Torres, S. Scherer, G. Stratou,
(ACII), 2015 International Conference on, pages R. Cowie, and M. Pantic. AVEC 2016 - depression,
260–266. AAAC, 2015. mood, and emotion recognition workshop and
[11] C. Howes, M. Purver, R. Mccabe, and R. Mccabe. challenge. In Proceedings of the 6th International
Linguistic indicators of severity and progress in online Workshop on Audio/Visual Emotion Challenge, 2016.
text-based therapy for depression. In ACL Workshop [21] M. Valstar, B. Schuller, K. Smith, T. Almaev,
on Computational Linguistics and Clinical Psychology: F. Eyben, J. Krajewski, R. Cowie, and M. Pantic.
From Linguistic Signal To Clinical Reality, pages AVEC 2014: 3D dimensional affect and depression
7–16, 2014. recognition challenge. In Proceedings of the 4th
[12] J. Joshi, R. Goecke, G. Parker, and M. Breakspear. International Workshop on Audio/Visual Emotion
Can body expressions contribute to automatic Challenge, pages 3–10. ACM, 2014.
depression analysis? In Automatic Face and Gesture [22] M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang,
Recognition (FG), 2013 10th IEEE International S. Bilakhia, S. Schnieder, R. Cowie, and M. Pantic.
Conference and Workshops on, pages 1–7. IEEE, 2013. AVEC 2013: the continuous audio/visual emotion and
95
depression recognition challenge. In Proceedings of the biomarkers of depression based on motor
3rd ACM international workshop on audio/visual incoordination. In Proceedings of the 3rd ACM
emotion challenge, pages 3–10. ACM, 2013. international workshop on Audio/visual emotion
[23] J. R. Williamson, T. F. Quatieri, B. S. Helfer, challenge, pages 41–48. ACM, 2013.
R. Horwitz, B. Yu, and D. D. Mehta. Vocal
96