Psychometric Properties of The Theory of Mind Assessment Scale in A Sample of Adolescents and Adults

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/302399303

Psychometric Properties of the Theory of Mind Assessment Scale in a Sample


of Adolescents and Adults

Article  in  Frontiers in Psychology · April 2016


DOI: 10.3389/fpsyg.2016.00566

CITATIONS READS

15 632

4 authors:

Francesca Marina Bosco Ilaria Gabbatore


Università degli Studi di Torino Università degli Studi di Torino
110 PUBLICATIONS   1,515 CITATIONS    42 PUBLICATIONS   387 CITATIONS   

SEE PROFILE SEE PROFILE

Maurizio Tirassa Silvia Testa


Università degli Studi di Torino Università degli Studi di Torino
63 PUBLICATIONS   831 CITATIONS    62 PUBLICATIONS   263 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

PIUMA - Personalized Interactive Urban Maps for Autism View project

Modelling Vocal Expression in Schizophrenia (MOVES) View project

All content following this page was uploaded by Francesca Marina Bosco on 09 May 2016.

The user has requested enhancement of the downloaded file.


ORIGINAL RESEARCH
published: 09 May 2016
doi: 10.3389/fpsyg.2016.00566

Psychometric Properties of the


Theory of Mind Assessment Scale in
a Sample of Adolescents and Adults
Francesca M. Bosco 1, 2 , Ilaria Gabbatore 3*, Maurizio Tirassa 1 and Silvia Testa 1
1
Department of Psychology, University of Turin, Turin, Italy, 2 Neuroscience Institute of Turin, University of Turin, Turin, Italy,
3
Faculty of Humanities, Research Unit of Logopedics, Child Language Research Center, University of Oulu, Oulu, Finland

This research aimed at the evaluation of the psychometric properties of the Theory of
Mind Assessment Scale (Th.o.m.a.s.). Th.o.m.a.s. is a semi-structured interview meant
to evaluate a person’s Theory of Mind (ToM). It is composed of several questions
organized in four scales, each focusing on one of the areas of knowledge in which
such faculty may manifest itself: Scale A (I-Me) investigates first-order first-person ToM;
Scale B (Other-Self) investigates third-person ToM from an allocentric perspective; Scale
C (I-Other) again investigates third-person ToM, but from an egocentric perspective;
and Scale D (Other-Me) investigates second-order ToM. The psychometric proprieties
of Th.o.m.a.s. were evaluated in a sample of 156 healthy persons: 80 preadolescent
Edited by:
Claire Marie Fletcher-Flinn, and adolescent (aged 11–17 years, 42 females) and 76 adults (aged from 20 to 67
University of Auckland, New Zealand years, 35 females). Th.o.m.a.s. scores show good inter-rater agreement and internal
Reviewed by: consistency; the scores increase with age. Evidence of criterion validity was found as
Giancarlo Dimaggio,
Scale B scores were correlated with those of an independent instrument for the evaluation
Centro di Terapia Metacognitiva
Interpersonale, Italy of ToM, the Strange Stories task. Confirmatory factor analysis (CFA) showed good fit
Sally Olderbak, of the four-factors theoretical model to the data, although the four factors were highly
Univertät Ulm, Germany
correlated. For each of the four scales, Rasch analyses showed that, with few exceptions,
*Correspondence:
Ilaria Gabbatore items fitted the Partial credit model and their functioning was invariant for gender and age.
[email protected]; The results of this study, along with those of previous researches with clinical samples,
[email protected]
show that Th.o.m.a.s. is a promising instrument to assess ToM in different populations.
Specialty section: Keywords: Theory of Mind, Th.o.m.a.s., validation of ToM tests, social cognition, metacognition
This article was submitted to
Cognitive Science,
a section of the journal
Frontiers in Psychology
INTRODUCTION
Received: 26 August 2015 The aim of this study was to investigate the psychometric properties of the Theory of Mind
Accepted: 05 April 2016 Assessment Scale (Th.o.m.a.s.; Bosco et al., 2009), a semi-structured interview developed for
Published: 09 May 2016 the assessment of Theory of Mind (ToM) in adolescents and adults (healthy and with clinical
Citation: pathologies). ToM is the capacity to ascribe mental states like emotions, intentions, desires, and
Bosco FM, Gabbatore I, Tirassa M beliefs to oneself and the others and to use this knowledge to predict, interpret, and explain the
and Testa S (2016) Psychometric
relevant actions and behaviors (Premack and Woodruff, 1978).
Properties of the Theory of Mind
Assessment Scale in a Sample of
The classic tests for the assessment of ToM, the false beliefs tasks, were created in the domain
Adolescents and Adults. of developmental psychology (Wimmer and Perner, 1983; Baron-Cohen et al., 1985). They require
Front. Psychol. 7:566. the subject to recognize another person’s beliefs when they differ from those of the subject herself,
doi: 10.3389/fpsyg.2016.00566 under the assumption that this is the only certain proof of the availability of a theory of mind

Frontiers in Psychology | www.frontiersin.org 1 May 2016 | Volume 7 | Article 566


Bosco et al. Psychometric Properties of TH.O.M.A.S

(Dennett, 1978). False belief tasks investigate first- or second- because judgments are only based on eyes expression, it only
order ToM. The former (Wimmer and Perner, 1983) is the focuses on a specific kind of mental state, namely recognition
ability to understand a person’s beliefs about a state of the world, of emotions, and therefore is able to assess only one facet
whereas the latter is the ability to ascribe nested mental states, of ToM.
i.e., to understand a person’s beliefs about someone else’s beliefs Recently, more attention has been paid to psychometric
(Perner and Wimmer, 1985). Empirical data have shown that properties in the creation of novel ToM tests. These tests have
children and clinical populations find second-order ToM tasks mainly been designed to investigate ToM in children with
more difficult to solve than first-order ones (Mazza et al., 2001; ASD. For example, the Animated Theory of Mind Inventory
Wellman and Liu, 2004). Due to the poor test-retest reliability for Children (ATOMIC; Beaumont and Sofronoff, 2008) was
for the scores obtained at false-belief questions, initial attempts created to assess ToM in children with Asperger Syndrome.
to validate false belief tasks did not give fully satisfactory results The tool consists of cartoons depicting a range of themes, each
(Mayes et al., 1996). followed by two multiple-choice questions. The ATOMIC has
Only few studies have explored the psychometric properties of proved capable of discriminating between clinical and control
ToM tests. One is the Theory of Mind (TOM) test (Muris et al., groups and appears to be significantly correlated with the
1999). This test, which was devised for children of 5–12 years, is Strange Stories task (Happé, 1994). Also the Theory of Mind
an interview composed of vignettes, stories, and drawings about Inventory (Hutchins et al., 2012) was developed to assess ToM
which the child is asked to answer several questions. The test in individuals with ASD. It works by asking the parents to
was administered to a sample of children with developmental compile a questionnaire consisting of statements toward which
disorders and a healthy one, showing that it is able to discriminate the interviewee expresses agreement or disagreement on a
between the two conditions and that its scores have good internal continuous metrics. The instrument appears to have excellent
consistency and inter-rater reliability, and sufficient test-retest test-retest reliability and internal consistency. Another recently
stability. developed tool, created for children with high functioning ASD
Other ToM tasks, like the Strange Stories (Happé, 1994) and is the Comic Strip Task (CST; Sivaratnam et al., 2012). It consists
the Faux pas (Baron-Cohen et al., 1999), were created to evaluate of vignettes investigating the child’s comprehension of other
more sophisticated aspects of ToM in children older than four persons’ beliefs, intentions, and emotional states and it appears
years of age. The Strange Stories task (Happé, 1994) assesses the to have moderate internal consistency and good discriminant
comprehension of complex mental states like misunderstanding validity.
and double bluffing, which require understanding social contexts. A different set of clinical tools, specifically created for adults,
It has been used with children, both in healthy (e.g., Devine investigates different, albeit related cognitive ability, namely self-
and Hughes, 2013) and pathological conditions (e.g., Charman reflection (Fonagy et al., 1991), and metacognition (Semerari
et al., 2001; Kaland et al., 2002; Velloso et al., 2013) as also in et al., 2003). Self-reflection is the capacity to understand and
adolescents and adults with Autism Spectrum Disorders (ASD) reason upon one’s own and other’s states like feelings, thoughts,
and Asperger syndrome (Jolliffe and Baron-Cohen, 1999; Kaland fantasies, beliefs, and desires (Gergely et al., 2002). Fonagy et al.
et al., 2005), (1998) developed the Reflective Functional scale (RF) to study
Although the tests discussed so far were first created for use the subjects’ ability to reflect upon their childhood experience
in developmental psychology they have often been employed, in mentalizing terms. The coding for the RF is based on the
possibly with some adaptation, in adults with clinical disorders interviewee’s ability to reflect on several relevant passages of
like schizophrenia (see for example Mazza et al., 2001; Pickup the Adult Attachment Interview (Main and Goldwyn, 1990).
and Frith, 2001) in addition to other specific tests, mostly Despite the possible theoretical similarity between the notion
involving picture sequencing tasks (see for example Langdon of self-reflection, as investigated by the RF scale, and that
et al., 2001; Brüne and Bodenstein, 2005; Brüne et al., 2016). (or those) of ToM, a study of Taylor et al. (2008) conducted
To our knowledge, however, only few psychometrics evaluations with persons with autism, failed to find significant correlations
of these tests in healthy adults have been provided. One is between their performance on the RF and ToM, at least as
the widely used Reading the Mind in the Eyes task (RME; assessed with the RME test discussed above (Baron-Cohen et al.,
Baron-Cohen et al., 2001) originally created to assess ToM in 2001). Most studies available in the literature that use the RF
children with Asperger Syndrome. It consists of photographs of are based on the Adult Attachment Interview; however, recent
the eyes region: the subject is asked to match each picture with researches have applied the RF to other clinical interviews,
the semantic definition of a specific emotion (e.g., “worried,” e.g., the Brief Reflective Functioning Interview (BRFI; Rudden
“annoyed”). Several studies of the psychometric properties of et al., 2005) and the Reflective Functioning Rating Scale (RFRS;
the RME were conducted with healthy adults in different Meehan et al., 2009). Moreover, in a recent review on Reflective
countries, but not with unanimous results: some studies found Functioning Katznelson (2014, p. 115) concluded that “more
a low level of internal coherence (Voracek and Dressler, 2006; research regarding reliability and validity of these measures -
Harkness et al., 2010; Olderbak et al., 2015) whereas others BRFI and RFRS- is necessary to qualify these more thoroughly.”
found an acceptable one (Serafin and Surian, 2004; Vellante Still another limitation of the RF is that it yields a unique
et al., 2013). Reports of test-retest reliability RME scores total score, thus underestimating the complexity of mentalizing
range from acceptable (Yildirim et al., 2011) to good (Vellante activities (Choi-Kain and Gunderson, 2008; Gullestad and
et al., 2013). RME is commonly used to assess ToM; however, Wilberg, 2011).

Frontiers in Psychology | www.frontiersin.org 2 May 2016 | Volume 7 | Article 566


Bosco et al. Psychometric Properties of TH.O.M.A.S

Metacognition is a wider construct. In Flavell’s (1979) original volitional and epistemic states may be acquired at different ages
definition, it includes any thought process that has as its object (e.g., Wellman, 1991; Wellman and Liu, 2004). Furthermore, it
the mind itself in its various interpersonal, emotional, and might be sensible to distinguish between different ways to which
cognitive dimensions. Examples of metacognition are memory, ToM may be put to use, e.g., in understanding or predicting
perception, or motivation. To study it, Semerari et al. (2012) another agent’s behavior, in attempting to affect it, and so on.
developed the Metacognition Assessment Interview (MAI), a The Th.o.m.a.s. (Bosco et al., 2009) is a semi-structured
semi-structured interview aimed to investigate different aspects open-question interview devised to capture these various facets
of metacognition; MAI is an adaptation of the Metacognition of ToM, namely first vs. third person, first vs. second order,
assessment Scale (MAS; Semerari et al., 2003). Semerari et al. egocentric vs. allocentric, different kind of mental states and
(2012) investigated the psychometric proprieties of the MAI on different uses that can be made of them, and thus to provide a
a sample of non-clinical subjects. Factors analysis showed a two broad assessment of ToM abilities both in healthy (adolescents
factors hierarchical structure corresponding to the two main and adults) and clinical conditions. Having a single instrument
metacognitive functions, the “self domain,” which is the ability capable of assessing several different facets or components of
to monitor and integrate mental aspects and the way in which a ToM allows to directly compare how they function in the same
person is aware of her mental state in relation to her behavior, individual or clinical sample.
and the “other domain,” which is the ability to adopt another Th.o.m.a.s. has been used in patients with a diagnosis
person’s perspective and to differentiate between different forms of schizophrenia (Bosco et al., 2009), preadolescents and
of representations, such as imagination, expectations, and reality. adolescents (Bosco et al., 2014b), sex offenders (Castellino
The inter-rater reliability and the internal consistency of MAI in et al., 2011), persons with alcohol use disorder (Bosco et al.,
these two domains were acceptable (Semerari et al., 2012). 2014a), persons with congenital heart disease (Chiavarino et al.,
Despite being obviously related to ToM, metacognition 2015), and persons with bulimia (Laghi et al., 2014). In all
is a wider construct, including more sophisticated mental these types of subjects Th.o.m.a.s. has systematically proved a
functions (Semerari et al., 2003, 2012) than the former, originally useful clinical tool, capable of discriminating between healthy
considered by Premack and Woodruff (1978) as a unitary control and non-healthy participants. Furthermore, it keeps into
faculty. Accordingly, most available tools for assessing it have account that different kinds of patients may in principle, and
embedded this assumption into their methodological approach actually do in practice, show different patterns of performance
and material structure. In time, however, it has been argued to the various ToM components mentioned above. In particular,
that ToM has a much more complex nature, thus opening the persons with a diagnosis of schizophrenia, persons with alcohol
way to the possibility of decomposing it into different aspects or use disorder, and sex offenders (Bosco et al., 2009, 2014a;
components. Laghi et al., 2014), in comparison to healthy controls, were
A first such operation is the distinction between third-person impaired to all the ToM dimensions investigated. Persons
ToM, i.e., the ability to attribute mental states to another person, with bulimia showed impairment in third-person ToM in the
and first-person ToM, i.e., the ability to attribute mental states allocentric perspective and in second-order ToM, but not in
to oneself (Nichols and Stich, 2003; Dimaggio et al., 2008). To third-person ToM in the egocentric perspective or in first-order
understand oneself and to understand another person appear ToM. Finally, persons with congenital heart diseases showed
to be different activities, mediated by different processes and impairment to third-person ToM, both in the egocentric and the
recruiting different kinds of knowledge. Within the domain of allocentric perspective, but not in first-person or second-order
third-person ToM a further distinction, proposed by Frith and ToM (Chiavarino et al., 2015). Globally, these studies testify to
De Vignemont (2005), takes place between an egocentric and an the necessity to have a tool able to separately investigate different
allocentric perspective. In the former, the mental states of other ToM dimensions in clinical samples.
agents are represented in relation to the self, while in the latter With the aim of verifying whether the results from Th.o.m.a.s.
they are represented independently from the self. Still another could be explained merely by differences in communicative-
difference occurs between first-order and second-order ToM. pragmatic abilities, Bosco et al. (2014b) created a second set
First-order ToM is the ability to grasp someone’s mental states of criteria for the evaluation of the participants’ performance.
(Wimmer and Perner, 1983), while second-order ToM is the The findings showed that communicative-pragmatic abilities, at
ability to infer what someone thinks about a third person’s mental least for the level required to answer Th.o.m.a.s., do not affect
states (Perner and Wimmer, 1985). Studies in the developmental performance.
(Wellman and Liu, 2004) and in the clinical domains (e.g., in The goal of this research is to further investigate the validity
patients with schizophrenia, Mazza et al., 2001) show that first- of Th.o.m.a.s. by assessing its reliability, its dimensional structure
order tasks are easier to be solved that second-order ones. and some aspects of items functioning and criterion validity in a
Further differences may be drawn between different types of sample of healthy people. In particular, we expect to find a fair
mental states that can be dealt with by the agent. It is commonly to good inter-raters reliability and a good internal consistency.
theorized in other areas of cognitive science that at least three We also expect to find a correlation between Th.o.m.a.s. Scale B
such types, namely beliefs, desires, and intentions, are needed to and another ToM task, the Strange Stories (Happé, 1994; Mazzola
capture an agent’s mind (see e.g., Rao and Georgeff, 1992; Tirassa, and Camaioni, 2002), because both tasks investigate third-
1999; Tirassa and Bosco, 2008), and theories in developmental person ToM in an allocentric perspective. For what concerns
psychology also point to the idea that the comprehension of the dimensional structure we expect to find four dimensions

Frontiers in Psychology | www.frontiersin.org 3 May 2016 | Volume 7 | Article 566


Bosco et al. Psychometric Properties of TH.O.M.A.S

corresponding to the four scales, namely first-person ToM • Scale B (Other–Self)—Allocentric third-person ToM.
(Scale A), third-person allocentric ToM (scale B), third-person These questions focus on how the interviewee thinks that
egocentric ToM (Scale C) and second-order ToM (Scale D), and other persons (Other) reflect on their mental states (Self),
an invariant functioning of items across gender and across age independently on her own position. This scale is akin to classic
groups. third-person ToM task.
• Scale C (I–Other)—Egocentric third-person ToM. These
questions focus on how the interviewee (I) reflects on the
MATERIALS AND METHODS mental states of other actors (Other). While both scales B and
Participants C investigate third-person ToM, the difference is that here it is
Two nearly equal-sized samples of preadolescent/adolescent and the interviewee’s positions that are highlighted, thus providing
adult volunteers, all native speakers of Italian, were recruited in a a sort of bridge between first- and third-person ToM.
number of local schools, university faculties, social organizations, • Scale D (Other–Me)—Second-order first-person ToM. These
sports clubs in two Italian cities (Torino and Asti). All the questions focus on how the interviewee conceives of the
participants took part voluntarily in the study; all of them, as knowledge that the others may have of her mental states, that
well as their parents when underage, were informed about the is how they (Other) reflect on her mental states (Me). The
procedures and gave their informed consent. The study was abstract structure of these questions thus is akin to classic
approved by the Bio-ethical Committee of the University of second-order tasks.
Turin. The four scales are each divided into three subscales investigating
None of them resulted to have a history of significant Awareness, Relation, and Realization, that is, respectively, how
neurological and/or psychiatric disorders or drug or alcohol the interviewee perceives different types of mental states, how he
abuse. During the recruitment phase, an assistant to the research recognizes the causal relations that hold between these mental
(with a degree in Psychology) handed to the prospective states and between them and an agent’s visible behaviors, and
participants an informative letter explaining the goal of the how he conceives of the possibility of affecting the mental
research. The letter also asked the subjects to withdraw from the states of his own and those of the others. The types of mental
study if they did not feel like participating or in the event of a states investigated are the most basic that must be comprised
past history of neurological or psychiatric disease, current or past in a complex cognitive architecture (Olson et al., 2006; Tirassa
history of alcohol or drug abuse, and current or past history of a et al., 2006a,b; Tirassa and Bosco, 2008), namely positive and
psychotherapy. negative emotions, volitional states like desires and intentions,
The preadolescents and adolescents sample was composed and epistemic states like knowledge and beliefs.
of 80 participants (42 females), ranging in age from 11 to 17 The replies given by the interviewee are organized into a
(M = 14.0; SD = 2.25), with an education ranging from 5 to grid (Table 1) of which the scales and subscales are the columns
12 years (M = 8.53; SD = 2.3). The adults sample consisted of and the types of mental states investigated are the rows. Each
76 individuals (35 females), ranging in age from 20 to 67 years cell is thus located at the intersection of two of the dimensions
(M = 40.72; SD = 11.93) with an education ranging from 5 to 18 considered, and each question, focusing on a specific aspect of
(M = 12.16; SD = 4.27). the features of ToM, refers to one cell of the table.
Two participants were excluded from the analysis due to For example, question [3]: When you feel bad, do you
technical problems with the audio recording of the interview. understand the reason why you feel like that? explores how the
interviewee reflects on her own negative feelings (dimensions
investigated: Awareness and Negative emotions); question [18]:
MATERIALS Do the others try to fulfill their desires? asks the interviewee
Theory of Mind Assessment Scale to reason about how the others’ desires and feelings are
interconnected (dimensions investigated: Relation and Desires);
(Th.o.m.a.s.)
and so on for each question.
Th.o.m.a.s. (see the references above) consists of 371 open-ended
questions that ask the interviewee to present and discuss her
reflections about the functioning of ToM in everyday life (see Strange Stories
Appendix A in Supplementary Material for the complete list In addition to Th.o.m.a.s., the participants were also
of items), also with the aid of examples that she may provide administered a selection of six items from the Italian version of
spontaneously or after a specific request from the interviewer. the Strange Stories (Mazzola and Camaioni, 2002), originally
The architecture underlying the interview groups the devised by Happé (1994). Each story contains two test questions:
questions in four scales that focus on the various internal or social the comprehension question (e.g., Was what X said true?), and
domains in which ToM plays a role. the justification question(s) (e.g., Why did X say that?). The
latter question requires an inference about the speaker’s/actor’s
• Scale A (I–Me)—First-order first-person ToM. It focuses on
intentions; correct performance requires attribution of mental
how the interviewee (I) reflects on her own mental states (Me).
states such as desires, beliefs or intentions, and sometimes
1 Previous
versions of the tool included 39 questions; in the final version, two were higher-order mental states such as one character’s belief about
dropped because they turned out to be redundant. what another character knows.

Frontiers in Psychology | www.frontiersin.org 4 May 2016 | Volume 7 | Article 566


Bosco et al. Psychometric Properties of TH.O.M.A.S

TABLE 1 | A graphic representation of the structure of Th.o.m.a.s.

Scale A (I–Me) First-order first-person ToM B (Other–Self) Allocentric third-person ToM

Subscale Awareness Relation Realization Awareness Relation Realization

Beliefs x 5 10 x 15 (15a) 20
Desires 7 (7a) 8 (8a) 9 17 (17a) 18 (18a) 19
Positive emotions 1 (1a) 2 6 (6a) 11 (11a) 12 16 (16a)
Negative emotions 3 (3a) 4 x 13 (13a) 14 x

Scale C (I–Other) Egocentric third-person ToM D (Other–Me) Second-order first-person ToM

Beliefs x 25 (25a) 28 x 35 (35a) 38


Desires 29 26 x 39 x x
Positive emotions 21 (21a) 22 27 31 (31a) 32 37
Negative emotions 23 (23a) 24 x 33 (33a) 34 x

Numbers in the table (e.g., 1) refer to the same-numbered question; numbers in parentheses (e.g., 1a) refer to the “Why not version” of the same question; for example, if the subject
responds negatively to question [1]: “Do you ever feel emotions that make you feel good?” the interviewer poses question [1a]: “Why not?”. Some cells contain an (x) because not all
the intersections between two dimensions have a relevant question, since in some cases this would sound contrived. For example, no question asking whether the interviewee is aware
of his own beliefs is posed, as it may be assumed that if one were not, one would just be unable to talk about them. Adapted from Bosco et al. (2009).

Procedure was attributed when the individual replied correctly to both the
The participants completed the Th.o.m.a.s. interview and Strange comprehension and the justification question.
stories task individually with a research assistant. The material The inter-rater reliability among the scores assigned by
was administered at school (adolescents) or at home (adults); the two independent judges at the Strange Stories task was
the session generally takes about 1 h. The research assistants calculated using Intraclass Correlation Coefficent; the ICC was
participating in the research were in total three. They all 0.94, indicating a very high agreement between raters.
had a degree in psychology and were trained by two of the
authors (I.G. and F.M.B.) on how to administer the interviews. Data Analysis
First they received an oral explanation of the aim and the The averages of the scores at each Th.o.m.a.s. scale and those of
procedure for the administration of Th.o.m.a.s. by I.G. or F.M.B. the Strange Stories task were inserted in the dataset and used for
They then practiced in the administration of Th.o.m.a.s. to part of the analysis.
a test subject (not included in the experimental sample) and In order to assess the inter-rater agreement an Intraclass
transcribed the interview. The transcription was then examined Correlation Coefficient2 (ICC) was calculated on the 29% of the
by I.G. or F.M.B.: if it was not satisfactory (e.g., because the sample for which the Th.o.m.a.s. interviews had been encoded
interviewer had suggested one or more answers), the error by two judges. As a rule of thumb, values between 0.41 and 0.60
was demonstrated and explained and another test interview stand for fair reliability, those between 0.61 and 0.80 for moderate
was conducted (again the subject was not included in the reliability, and those between 0.81 and 0.90 for substantial
sample). The procedure was repeated until the interview was reliability (Shrout, 1998). Cronbach’s alpha was used to evaluate
conducted satisfactorily (two/three test interviews always did the the internal consistency of the scores on the four scales.
job). Confirmatory factor analysis (CFA) was applied to assess the
With the authorization of the interviewees or of their parents goodness of fit of the 4-factors model representing the four
all the interviews were tape-recorded and then transcribed to scales, namely A (I–Me), B (Other–Self), C (I–Other), and D
enable offline scoring. The participants were informed that their (Other–Me). The analysis was performed on the covariance
participation was voluntary and that the aim and contents of the matrix of the 37 items (Appendix C), using Lisrel 8.72 (Jöreskog
research would be explained at the end of the session. and Sörbom, 1996). Because of the small size of the sample,
The responses both to Th.o.m.a.s. and to the Strange Stories Maximum Likelihood method (ML) without correcting the chi-
were rated by another research assistant, blind to the aims of square and standard errors was employed even though data
the study; moreover, 29% of the sessions were rated by a second violated the multinormality condition (Mardia’s multivariate
independent judge, again blind to the aim and the scope of the omnibus test of skewness and kurtosis (2, 154) = 1711.8;
research, in order to evaluate the inter-rater agreement. In rating p < 0.001). The following criteria were used to evaluate
Th.o.m.a.s. the judges were instructed to assign each answer a the fit of the model as acceptable: RMSEA < 0.08; CFI >
score from 0 to 4, according to given rating criteria (see Appendix 0.95; SRMR < 0.08 (Browne and Cudeck, 1993; Hu and
B in Supplementary Material), and to insert it in the relevant cell Bentler, 1995, 1999). In order to assess whether the tasks
of the scoring grid. composing the four scales require different levels of ToM
In rating the Strange Stories task the judges followed the 2 Inparticular, the ICC type C1, which measures the absolute agreement in a
criterion originally proposed by Happé (1994), namely to assign two-way random analysis of variance model for average measures, was adopted
0 to an incorrect answer and 1 to a correct one. A score of 1 (McGraw and Wong, 1996).

Frontiers in Psychology | www.frontiersin.org 5 May 2016 | Volume 7 | Article 566


Bosco et al. Psychometric Properties of TH.O.M.A.S

ability the Friedman test was performed on the four average TABLE 2 | Inter-rater agreement (N = 45) and internal consistency (154) of
scores. the four Th.o.m.a.s. scales.

A Rasch model for items with ordered response categories, Scale ICC alpha
the Partial credit model as implemented in Winsteps (Linacre,
2009), was applied to assess the psychometric properties A (I–Me) First-order first-person ToM 0.59 0.89
of each unidimensional scale. Dimensionality was checked B (Other–Self) Allocentric third-person ToM 0.65 0.88
by performing Principal component analysis (PCA) on the C (I–Other) Egocentric third-person ToM 0.71 0.89
residuals; scales for which the first eigenvalue was ≤2 were D (Other–Me) Second-order (first-person) ToM 0.49 0.86
considered unidimensional (Linacre, 2009). Scores reliability was
evaluated by the Person Separation index (PSEP), where values
≥1.50 are considered acceptable (Boone et al., 2014). Item quality Factorial Structure
was assessed by Infit and Outfit statistics and values within the The theoretical model consisting of four latent variables
0.7–1.3 range were considered satisfactory (Wright and Linacre, representing the four Th.o.m.a.s. scales fitted the data quite well:
1994). Differential item functioning (DIF) for gender and age χ2(623) = 1138.4, p < 0.001; RMSEA = 0.073 (CI 90% = 0.066–
groups was evaluated: a DIF value > 0.64 logits (in absolute 0.080); CFI = 0.97 and SRMR = 0.058. All the loadings were high
value) with a p < 0.05 was considered indicative of the and statistically significant (Figure 1). The correlations between
persistence of a difference in item functioning across gender or the four dimensions were very high, ranging from 0.94 to 1.00.
age groups, after controlling for differences in person location In order to assess which of the four scales can be discriminated
(Boone et al., 2014). in a healthy sample, several more parsimonious models with 3,
Criterion validity was assessed as the difference in means 2, and 1 factors were estimated; the chi-square difference test
between adolescents (whose scores were expected to be lower) and the Consistent Akaike Information Criterion (CAIC) were
and adults (whose scores were expected to be higher) and used to compare nested and non-nested models respectively.
as the correlation with the independent evaluation of ToM The bidimensional model that isolated the Scale B (Other–Self)
provided by the Strange Stories. Multivariate analysis of resulted in a statistically nonsignificant χ2 difference test when
variance (MANOVA) was employed to assess the difference compared to the 4-factors solution [χ2(5) = 9.6; p > 0.05] and
in means between adolescents and adults on the four scales. lower CAIC value; consequently, it was chosen as the model
Such approach was needed because of the high correlation which fitted the data best. In this solution with χ2(628) = 1148.0,
between the dependent variables. Since the two groups were p < 0.001; RMSEA = 0.073 (CI 90% = 0.066–0.080); CFI
about the same size, both multivariate and univariate tests = 0.97, and SRMR = 0.058, all the loadings were statistically
could be considered robust to departures from normality significant and high, ranging from 0.59 to 0.81 (with a mean of
and from homogeneous covariance matrices conditions3 . To 0.66 and 0.67 for the two factors) and the standardized covariance
assess the correlation between the scores at the Th.o.m.a.s. was equal to 0.96. The unidimensional model, albeit adequate
and that at the Strange Stories task the Pearson coefficient in terms of fit indices, was not acceptable because it exhibited
partialized for age and years of education and unpartialized was a significant χ2 difference test when compared to the 4-factors
calculated. solution [χ2(6) = 20.8; p < 0.01]. Thus, in a healthy sample,
With the exception of CFA and Rasch analysis, all the analysis only two factors seem distinguishable: the one belonging to Scale
were performed with SPSS 20. B (Other–Self) and a broader one composed by the other three
scales.
In order to investigate whether differences in the performance
at the different scales were detectable, we analyzed the means of
RESULTS each scale. On average, the sample performed better on scale A
(I-Me: M = 3.50; SD = 0.49) than B (Other-Self: M = 3.32;
Inter-Rater Agreement and Internal SD = 0.55), C (I-Other: M = 3.31; SD = 0.58) or D (Other-
Consistency Me: M = 3.30; SD = 0.55): the Friedman test resulted in a
Overall, the inter-rater agreement was acceptable (Table 2). In significant overall effect [χ2(3) = 77.2; p < 0.001] and the post-
particular, scale A (first-person ToM: I–Me) and scale D (second- hoc analysis, with a Bonferroni correction applied, showed that
order ToM: Other–Me) displayed fair reliability (0.59 and 0.49 only the pairwise comparisons involving scale A (I-Me) were
respectively) whereas scales B and C, respectively investigating statistically significant (p < 0.001).
allocentric third-person ToM (Other–Self) and egocentric third-
person ToM (I–Other), showed moderate reliability (0.65 and Rasch Analysis
0.71, respectively). All the four scales provided good results for Considering the high correlations between the four factors
internal consistency. Cronbach’s alpha ranged from 0.86 to 0.89 yielded by the CFA analysis, the Partial credit model was
(Table 2). estimated on the whole pool of 37 items. The PCA on residuals
signaled that more than one dimension were present as the
3 Asan additional check on the validity of the results of the parametric analysis
eigenvalues of the first three components were >2. Excluding
a non parametric MANOVA (Finch, 2005) and the Mann-Whitney test were scale B, that resulted as a separate factor in the previous CFA
performed. analysis, the eigenvalue criteria was not yet respected since the

Frontiers in Psychology | www.frontiersin.org 6 May 2016 | Volume 7 | Article 566


Bosco et al. Psychometric Properties of TH.O.M.A.S

FIGURE 1 | Standardized solution of the four-factors CFA model of Th.o.m.a.s. (N = 154).

Frontiers in Psychology | www.frontiersin.org 7 May 2016 | Volume 7 | Article 566


Bosco et al. Psychometric Properties of TH.O.M.A.S

TABLE 3 | Summary of the Partial credit model results. MANOVA analysis yielded statistical significance for both
the omnibus F statistics and the four univariate F test (Table 5)4 .
Sub-scales PCA PSEP Infit Outfit DIF (d) DIF (d)
(a) (b) (c) (c) sex age

A 1.9 1.83 Item 2 Item 2 Item 4


DISCUSSION
Item 6 Item 6
Item 1 The Th.o.m.a.s. (Bosco et al., 2009; see Appendix A in
Supplementary Material) is a semi-structured interview
B 2.1 1.89 Item 16 Item 16 Item 20 Item 16 investigating Theory of Mind (ToM). The 37 open-ended
Item 20 Item 20 Item 17 questions of which it is comprised are organized in four scales,
Item 18
called A (I–Me), B (Other–Self), C (I–Other), and D (Other–
C 1.7 1.99 – – – Item 23 Me), each focusing on one of the knowledge domains in which
Item 26 ToM manifests itself. The questions leave the interviewee free to
articulate her thoughts; she is also invited to propose examples
D 2.1 1.82 – – – Item 38 taken from her own biography or anyway from the real world,
(a) First eigenvalue of the Principal component analysis on residuals; (b) Person separation
and thus to make her understanding of the mental states both
index; (c) items with infit/outfit statistics out of the range 0.7–1.3; (d) Items with differential of her own and of the others explicit and to reflect upon them.
functioning for sex or age (adolescents vs. adults). Th.o.m.a.s has been administered to persons with a diagnosis
of schizophrenia (Bosco et al., 2009), sex offenders (Castellino
et al., 2011), persons suffering from alcohol abuse (Bosco et al.,
first and the second eigenvalues were still >2. Therefore, the
2014a), persons with congenital heart disease (Chiavarino
four scales were analyzed separately, which yielded the results
et al., 2015), and persons with bulimia (Laghi et al., 2014). In
summarized in Table 3.
each of these cases Th.o.m.a.s. has proved a useful clinical tool
Eigenvalue criteria were respected for scales A and C, slightly
able to discriminate between healthy control and non-healthy
above the cut-off for scales B and D. Reliability was good for
participants.
all the scales, giving scores with PSEP values > 1.5. All the
The aim of this study was to assess the validity and the
items of scales C and D showed acceptable values for Infit and
reliability of the Th.o.m.a.s. scores. In particular inter-rater
Oufit statistics and their functioning was invariant for gender.
agreement, internal consistency, dimensional structure, items’
The two scales resulted to be partially invariant with respect
functioning, and criterion validity were evaluated in a sample of
to age groups: two items of scale C and one of scale D had a
156 healthy adolescents, and adults.
non-negligible DIF value between adolescents and adults. Some
Internal consistency of the scores in the four scales composing
misfitting items were present in scales A and B; these scales
Th.o.m.a.s. ranged from good to really good as defined in the
exhibited partial invariance for both gender and age groups.
literature (De Vellis et al., 1991). Reliability was satisfactory
Overall, two items (item 16 and item 20 of scale B) were
also when evaluated by Partial credit model. The inter-rater
unsatisfactory on both infit/outfit and DIF statistics; in each scale
agreement was acceptable, ranging from fair to moderate (Shrout,
there were 6 or 7 well performing items. A content analysis
1998).
of unfitting items was performed, but since problematic items
The dimensional structure of the Th.o.m.a.s. scores was
were few and they were crucial to the instrument, all were
explored with both CFA and Rasch analysis, yielding divergent
retained.
results. The CFA model representing the four theoretical scales
fitted the data very well, but factors were highly correlated and a
Criterion Validity more parsimonious two factors model fitted the data equally well.
The Strange Stories (administered to 115 subjects, i.e., 74% of Correlation was also very high (0.96) in the latter model, which
the total sample) scores were used as an independent ToM might suggest that a single broader ToM dimension existed.
measure to assess the criterion validity of Th.o.m.a.s. In terms By contrast, the PCA of model residuals in the Partial credit
of percentage of correct answers to all the six tasks, the model analysis showed that the 37 items of the instrument were
adults performed better than the adolescents. The difference not indicators of a single latent construct, but belonged to four
between the two percentages (68.6% for the adults, 48.8% for distinct scales, corresponding to those that were theoretically
the adolescents) was statistically significant [t-test for unequal expected.
variances, t(63) = −2.03, p = 0.046]. As reported in literature, factor analysis and Rasch modeling
As shown in Table 4, only Scale B (Other–Self), i.e., can produce divergent results in terms of dimensionality under
the scale investigating third-person ToM in an egocentric specific conditions regarding, for example, the proportion of
perspective, correlated positively with the Strange Stories. items per dimension, the level of correlation between dimensions,
This correlation was statistically significant both when the and a non-linear relationship between items scores and the
unpartialized coefficient was used and when the correlation was latent dimension (McDonald, 1965; Smith, 1996; Waugh and
adjusted for age and education. Chapman, 2005; Yu et al., 2007). The reason for the discrepancy
As regards the difference between the means of 4 The nonparametric MANOVA and the Mann-Whitney test statistics also resulted
preadolescents/adolescents and those of the adults, the in statistically significant differences.

Frontiers in Psychology | www.frontiersin.org 8 May 2016 | Volume 7 | Article 566


Bosco et al. Psychometric Properties of TH.O.M.A.S

TABLE 4 | Pearson correlations between Th.o.m.a.s. scales and the Strange Stories scores.

Scale A I–Me First-order Scale B Other–Self, Scale C I–Other, Egocentric Scale D Other–Me, Second-order
first-person ToM Allocentric third-person ToM third-person ToM (first-person) ToM

Unpartialized 0.136 0.229* 0.126 0.119


Partialized for age 0.071 0.191* 0.056 0.056
and education

*p < 0.05.

TABLE 5 | MANOVA results on preadolescents/adolescents vs. adults difference in means on the four Th.o.m.a.s. scales.

Scale Preadolescents and adolescentsa (N = 80) Adultsa (N = 74) Univariate F statistics and η2

A 3.21 (0.48) 3.82 (0.25) F(1, 152) = 93.73; p < 0.0001, 0.38
I–Me, First-order first-person ToM
B 2.97 (0.46) 3.70 (0.36) F(1, 152) = 116.41; p < 0.0001, 0.43
Other–Self, Allocentric third-person ToM
C 2.94 (0.51) 3.71 (0.35) F(1, 152) = 117.98; p < 0.0001, 0.43
I–Other, Egocentric third-person ToM
D 2.98 (0.49) 3.66 (0.35) F(1, 152) = 95.70; p < 0.0001, 0.38
Other–Me, Second-order first-person ToM

a Mean and (standard deviation); Multivariate F statistics associated to Pillai’s trace: F(4, 149) = 35.57; p < 0.001.

between CFA and Partial credit model in our study lies most Regarding the comparison between mean scales scores, the
likely in the high correlation between the four scale scores. As only significant difference found was that the sample performed
shown in a simulation study by Smith (1996), Rasch analysis better to Scale A (I–Me) with respect to B (Other–Self) and C
works better than factor analysis when dimensions are highly (I–Other). This is in line with other studies in the literature
correlated and worse when correlations are low. Moreover, Rasch to the effect that first-person ToM is generally the easiest
analysis, which does not rely upon correlations, is preferable to to handle (see also Lysaker et al., 2005). The sample also
factor analysis when the variables are not continuous (Boone performed better to Scale A (I–Me) than D (Other–Me). This
et al., 2014). In the light of these remarks, and according to Rasch is again in line with the literature, according to which healthy
results, Th.o.m.a.s. can be considered an instrument assessing children find first-order ToM tasks easier than second-order
four distinct, even if highly correlated, dimensions of ToM. ones (Perner and Wimmer, 1985; the two types of tasks are
The high level of correlation between the dimensions scores respectively explored in Scales A and D of Th.o.m.a.s.). This is
deserves further consideration. A certain amount of correlation is also the case in clinical samples: first-order ToM is easier than
theoretically expected, since the four dimensions are components second-order to persons with a diagnosis of schizophrenia, both
of a broader construct, namely ToM abilities; however, the level when evaluated with Th.o.m.a.s. (Bosco et al., 2009) and with
of correlation was probably inflated due to some methodological other classic false-belief tasks (e.g., Mazza et al., 2001). Instead,
features: (i) the uniformity of the test structure, which is entirely the difference between allocentric and egocentric third-person
composed of open-ended questions; (ii) the persistence of the ToM has remained quite unexplored in the literature about
same persons as raters; and (iii) the uniformity of the contents mentalizing. A previous study using Th.o.m.a.s. in sex offenders
investigated (all the scales assess mental states related to beliefs, (Castellino et al., 2011) found that they performed worse on
emotions and desires). Furthermore, in healthy adults these Scale B (allocentric) than C (egocentric third-person ToM),
different dimensions of ToM are substantially well integrated showing that the comparison between the two perspectives
(which may not be the case in clinical populations), producing may be interesting in some cases. However, further studies
high scores overall the four scales. Younger people obtained with clinical samples are necessary in order to investigate
lower scores, which might also have contributed to the inflation this issue.
of correlations (Bewick et al., 2003). We employed the Strange Stories task as an independent ToM
Overall, the performance of the Partial credit models in each measure to analyze criterion validity. Statistical analysis showed
of the four scales was satisfactory. Only six items out of 37 that it correlated positively with Scale B (Other–Self: allocentric
showed poor fitting and scales resulted to be partially invariant third-person ToM), but not with the other three Th.o.m.a.s.
with respect to age and gender. In fact, only in few cases item scales. This is as expected, since the Strange Stories task measures
locations (difficulties) were not the same between adolescents and third-person, allocentric ToM.
adults with the same person location (ability) or between male Finally, MANOVA results confirmed the expectation
and female with the same person location. that the scores would increase from adolescents to

Frontiers in Psychology | www.frontiersin.org 9 May 2016 | Volume 7 | Article 566


Bosco et al. Psychometric Properties of TH.O.M.A.S

adults, thus adding further evidence to the idea that the native Italians, or Europeans, or Westerners conceive of ToM, or
development of ToM continues during childhood, through in universal features of human social cognition. Of course, this
adolescence (Choudhury et al., 2006; Bosco et al., 2014b; might then yield modifications either in the instrument itself or
Brizio et al., 2015) and into adulthood (Maylor et al., 2002; at least in how scores would be given to members of different
Dumontheil et al., 2010). cultures.
In conclusion, our results supported the theoretical distinction Still another direction of development which can be expected
among the four scales. Despite the strong correlations between to yield interesting results is the use of Th.o.m.a.s. with different
them in the present sample of healthy people, they should not be types of clinical populations. Those in which it has already
considered secondary dimensions of a broader but homogeneous been employed (namely, to repeat, schizophrenia, criminal
ToM factor or treated as source of noise in the data. Actually, at sexual behaviors, alcohol abuse, congenital heart disease, and
least two theoretically sound features emerged, namely that Scale bulimia) do exhibit differences in their respective profiles of ToM
A (I–Me) is easier than the others and that only Scale B (Other– (mal)functioning. Given the importance of ToM in our species,
Self) was correlated to a third-person ToM test, the Strange its delicacy, and its dependence on individual and contextual
Stories task. This conclusion is also supported by previous factors, this comes as no surprise; it is analogously reasonable
researches finding different patterns of performance on the four to expect further differences to be found in other conditions of
scales in different clinical samples (Laghi et al., 2014; Chiavarino clinical interest.
et al., 2015).
Future research directions basically coincide with the attempts ACKNOWLEDGMENTS
to overcome the current limitations of Th.o.m.a.s. and its use.
First, the size of the healthy sample ought to be steadily increased The project was found by University of Turin, Founding for the
from the current figure of 156. Furthermore, it will be necessary Local Research, projects years 2014 and 2015.
to provide the normative data for the Italian population and to
administer additional ToM tests beyond the Strange Stories to SUPPLEMENTARY MATERIAL
provide further empirical evidence on construct validity (see for
example Brüne et al., 2016). The Supplementary Material for this article can be found
It will also be necessary to understand the cultural properties online at: http://journal.frontiersin.org/article/10.3389/fpsyg.
of Th.o.m.a.s., that is the extent to which it is embedded in how 2016.00566

REFERENCES Browne, M. W., and Cudeck, R. (1993). “Alternatives ways of assessing model fit,”
in Testing Structural Equation Models, eds K. A. Bollen and J. S. Long (London:
Baron-Cohen, S., Leslie, A. M., and Frith, U. (1985). Does the autistic child have a Sage), 132–162.
“theory of mind”? Cognition 21, 37–46. doi: 10.1016/0010-0277(85)90022-8 Brüne, M., and Bodenstein, L. (2005). Proverb comprehension reconsidered—
Baron-Cohen, S., O’Riordan, M., Stone, V., Jones, R., and Plaisted, K. (1999). ‘theory of mind’ and the pragmatic use of language in schizophrenia. Schizophr.
Recognition of faux pas by normally developing children and children with Res. 75, 233–239. doi: 10.1016/j.schres.2004.11.006
Asperger syndrome or high-functioning autism. J. Autism Dev. Disord. 29, Brüne, M., Walden, S., Edel, M. A., and Dimaggio, G. (2016). Mentalization of
407–418. complex emotions in borderline personality disorder: the impact of parenting
Baron-Cohen, S., Wheelwright, S., Spong, A., Scahill, V., and Lawson, J. (2001). Are and exposure to trauma on the performance in a novel cartoon-based task.
intuitive physics and intuitive psychology independent? A test with children Compr. Psychiatry 64, 29–37. doi: 10.1016/j.comppsych.2015.08.003
with Asperger Syndrome. J. Dev. Learn. Disord. 5, 47–78. Castellino, N., Bosco, F. M., Marshall, W. L., Marshall, L. E., and Veglia, F.
Beaumont, R. B., and Sofronoff, K. (2008). A new computerized advanced theory (2011). Mindreading abilities in sexual offenders: an analysis of theory of
of mind measure for children with Asperger syndrome: the ATOMIC. J. Autism mind processes. Conscious. Cogn. 20, 1612–1624. doi: 10.1016/j.concog.2011.
Dev. Disord. 38, 249–260. doi: 10.1007/s10803-007-0384-2 08.011
Bewick, V., Cheek, L., and Ball, J. (2003). Statistics review 7: correlation and Charman, T., Carroll, F., and Sturge, C. (2001). Theory of mind, executive function
regression. Crit. Care 7, 451–459. doi: 10.1186/cc2401 and social competence in boys with ADHD. Emot. Behav. Diff. 6, 31–49. doi:
Boone, W. J., Staver, J. R., and Yale, M. S. (2014). Rasch Analysis in the Human 10.1080/13632750100507654
Sciences. Dordrecht: Springer. doi: 10.1007/978-94-007-6857-4 Chiavarino, C., Bianchino, C., Brach-Prever, S., Riggi, C., Palumbo, L., Bara, B.
Bosco, F. M., Capozzi, F., Colle, L., Marostica, P., and Tirassa, M. (2014a). Theory G., et al. (2015). Theory of mind deficit in adult patients with congenital heart
of mind deficit in subjects with alcohol use disorder: an analysis of mindreading disease. J. Health Psychol. 20, 1253–1262. doi: 10.1177/1359105313510337
processes. Alcohol Alcohol. 49, 299–307. doi: 10.1093/alcalc/agt148 Choi-Kain, L. W., and Gunderson, J. G. (2008). Mentalization: ontogeny,
Bosco, F. M., Colle, L., De Fazio, S., Bono, A., Ruberti, S., and Tirassa, M. (2009). assessment, and application in the treatment of borderline personality disorder.
Th.o.m.a.s.: an exploratory assessment of Theory of Mind in schizophrenic Am. J. Psychiatry. 165, 1127–1135. doi: 10.1176/appi.ajp.2008.07081360
subjects. Conscious. Cogn. 18, 306–319. doi: 10.1016/j.concog.2008.06.006 Choudhury, S., Blakemore, S. J., and Charman, T. (2006). Social cognitive
Bosco, F. M., Gabbatore, I., and Tirassa, M. (2014b). A broad assessment of theory development during adolescence. Soc. Cogn. Affect. Neurosci. 1, 165–174. doi:
of mind in adolescence: the complexity of mindreading. Conscious. Cogn. 24, 10.1093/scan/nsl024
84–97. doi: 10.1016/j.concog.2014.01.003 Dennett, D. C. (1978). Beliefs about beliefs. Behav. Brain Sci. 1, 568–570. doi:
Brizio, A., Gabbatore, I., Tirassa, M., and Bosco, F. M. (2015). “No more a child, not 10.1017/S0140525X00076664
yet an adult”: studying social cognition in adolescence. Front. Psychol. 6:1011. De Vellis, R. F. (1991). Scale Development: Theory and Application. Thousand
doi: 10.3389/fpsyg.2015.01011 Oaks, CA: Sage.

Frontiers in Psychology | www.frontiersin.org 10 May 2016 | Volume 7 | Article 566


Bosco et al. Psychometric Properties of TH.O.M.A.S

Devine, R. T., and Hughes, C. (2013). Silent films and strange stories: theory Langdon, R., Coltheart, M., Ward, P. B., and Catts, S. V. (2001).
of mind, gender, and social experiences in middle childhood. Child Dev. 84, Mentalising, executive planning and disengagement in schizophrenia.
989–1003. doi: 10.1111/cdev.12017 Cogn. Neuropsychiatry 6, 81–108. doi: 10.1080/13546800042000061
Dimaggio, G., Lysaker, P. H., Carcione, A., Nicolò, G., and Semerari, Linacre, J. M. (2009). Winsteps (Version 3.68.0) [Computer Software]. Chicago, IL:
A. (2008). Know yourself and you shall know the other . . . to a Winsteps.com.
certain extent: multiple paths of influence of self-reflection on Lysaker, P. H., Carcione, A., Dimaggio, G., Johannesen, J. K., Nicol`, G.,
mindreading. Conscious. Cogn. 17, 778–789. doi: 10.1016/j.concog.2008. Procacci, M., et al. (2005). Metacognition amidst narratives of self and
02.005 illness in schizophrenia: associations with neurocognition, symptoms, insight
Dumontheil, I., Apperly, I. A., and Blakemore, S. J. (2010). Online usage of theory and quality of life. Acta Psychiat. Scand. 112, 64–71. doi: 10.1111/j.1600-
of mind continues to develop in late adolescence. Dev. Sci. 13, 331–338. doi: 0447.2005.00514.x
10.1111/j.1467-7687.2009.00888.x Main, M., and Goldwyn, R. (1990). “Adult attachment rating and classification
Finch, H. (2005). Comparison of the performance of nonparametric and systems,” in A Typology of Human Attachment Organization Assessed in
parametric MANOVA test statistics when assumptions are violated. Discourse, Drawings and Interviews, ed M. Main (New York: Cambridge
Methodology 1, 27–38. doi: 10.1027/1614-1881.1.1.27 University Press), 36–51.
Flavell, J. H. (1979). Metacognition and cognitive monitoring: a new area of Mayes, L. C., Klin, A., Tercyak, K. P. Jr, Cicchetti, D. V., and Cohen, D. J.
cognitive–developmental inquiry. Am. psychol. 34, 906–911. doi: 10.1037/0003- (1996). Test-retest reliability for false-belief tasks. J. Child Psychol. Psychiatry
066X.34.10.906 37, 313–319. doi: 10.1111/j.1469-7610.1996.tb01408.x
Fonagy, P., Steele, M., Steele, H., Moran, G. S., and Higgitt, A. C. (1991). The Maylor, E. A., Moulson, J. M., Muncer, A. M., and Taylor, L. A. (2002). Does
capacity for understanding mental states: the reflective self in parent and child performance on theory of mind tasks decline in old age? Br. J. Psychol. 93,
and its significance for security of attachment. Infant Ment. Health J. 12, 465–485. doi: 10.1348/000712602761381358
201–218. Mazza, M., De Risio, A., Surian, L., Roncone, R., and Casacchia, M. (2001).
Fonagy, P., Target, M., Steele, H., and Steele, M. (1998). Reflective-Functioning Selective impairments of theory of mind in people with schizophrenia.
Manual, Version 5.0, for Application to Adult Attachment Interviews. London: Schizophr. Res. 47, 299–308. doi: 10.1016/S0920-9964(00)00157-2
University College London. Mazzola, V., and Camaioni, L. (2002). Strane Storie: Versione Italiana a Cura
Frith, U., and De Vignemont, F. (2005). Egocentrism, allocentrism, and Asperger di Mazzola e Camaioni. Department of Dynamic and Clinical Psychology,
syndrome. Conscious. Cogn. 14, 719–738. doi: 10.1016/j.concog.2005.04.006 Università “La Sapienza”, Roma.
Gergely, G., Fonagy, P., Jurist, E., and Target, M. (2002). Affect Regulation, McDonald, R. P. (1965). Difficulty Factors and Non Linear Factor Analysis1. Br. J.
Mentalization, and the Development of the Self. New York, NY: Other Press. Math. Stat. Psychol. 18.1, 11–23.
Gullestad, F. S., and Wilberg, T. (2011). Change in reflective functioning McGraw, K. O., and Wong, S. P. (1996). Forming inferences about some
during psychotherapy—A single-case study. Psychother. Res. 21, 97–111. doi: intraclass correlation coefficients. Psychol. Methods 1, 30–46. doi: 10.1037/1082-
10.1080/10503307.2010.525759 989X.1.1.30
Happé, F. G. (1994). An advanced test of theory of mind: understanding of Meehan, K. B., Levy, K. N., Reynoso, J. S., Hill, L. L., and Clarkin, J. F. (2009).
story characters’ thoughts and feelings by able autistic, mentally handicapped, Measuring reflective function with a multidimensional rating scale: comparison
and normal children and adults. J. Autism Dev. Disord. 24, 129–154. doi: with scoring reflective function on the AAI. J. Am. Psychoanal. Assoc. 57,
10.1007/BF02172093 208–213. doi: 10.1177/00030651090570011008
Harkness, K. L., Jacobson, J. A., Duong, D., and Sabbagh, M. A. (2010). Mental state Muris, P., Steerneman, P., Meesters, C., Merckelbach, H., Horselenberg, R., van
decoding in past major depression: effect of sad versus happy mood induction. den Hogen, T., et al. (1999). The TOM test: A new instrument for assessing
Cogn. Emot. 24, 497–513. doi: 10.1080/02699930902750249 theory of mind in normal children and children with pervasive developmental
Hu, L., and Bentler, P. M. (1995). “Evaluating model fit,” in Structural Equation disorders. J. Autism Dev. Disord. 29, 67–80. doi: 10.1023/A:1025922717020
Modelling: Concepts, Issues, and Applications, ed R. H. Hoyle (Thousand Oaks, Nichols, S., and Stich, S. P. (2003). Mindreading. An Integrated Account of Pretence,
CA: Sage), 76–99. Self-Awareness, and Understanding Other Minds. Oxford: Clarendon Press. doi:
Hu, L., and Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance 10.1093/0198236107.001.0001
structure analysis: conventional criteria versus new alternatives. Struct. Equat. Olderbak, S., Wilhelm, O., Olaru, G., Geiger, M., Brenneman, M. W., and Roberts,
Model. 6, 1–55. doi: 10.1080/10705519909540118 R. D. (2015). A psychometric analysis of the reading the mind in the eyes test:
Hutchins, T. L., Prelock, P. A., and Bonazinga, L. (2012). Psychometric evaluation toward a brief form for research and applied settings. Front. Psychol. 6:1503.
of the theory of mind inventory (ToMI): a study of typically developing doi: 10.3389/fpsyg.2015.01503
children and children with autism spectrum disorder. J. Autism Dev. Disord. Olson, D. R., Antonietti, A., Liverta-Sempio, O., and Marchetti, A. (2006). “The
42, 327–341. doi: 10.1007/s10803-011-1244-7 mental verbs in different conceptual domain and in different cultures,” in
Jolliffe, T., and Baron-Cohen, S. (1999). The strange stories test: a replication with Theory of Mind and Language in Developmental Contexts, eds A. Antonietti, O.
high-functioning adults with autism or Asperger syndrome. J. Autism Dev. Liverta Sempio, and A. Marchetti (Berlin: Springer). doi: 10.1007/0-387-24997-
Disord. 29, 395–406. doi: 10.1023/A:1023082928366 4_2
Jöreskog, K. G., and Sörbom, D. (1996). LISREL 8: User’s Reference Guide. Chicago, Perner, J., and Wimmer, H. (1985). “John thinks that Mary thinks that . . . ”:
IL: Scientific Software International. attribution of second-order beliefs by 5- to 10-year-old children. J. Exp. Child
Kaland, N., Møller−Nielsen, A., Callesen, K., Mortensen, E. L., Gottlieb, D., Psychol. 39, 437–471. doi: 10.1016/0022-0965(85)90051-7
and Smith, L. (2002). A newadvanced’test of theory of mind: evidence from Pickup, G. J., and Frith, C. D. (2001). Theory of mind impairments in
children and adolescents with Asperger syndrome. J. Child Psychol. Psychiatry schizophrenia: symptomatology, severity and specificity. Psychol. Med. 31,
43, 517–528. doi: 10.1111/1469-7610.00042 207–220. doi: 10.1017/S0033291701003385
Kaland, N., Møller-Nielsen, A., Smith, L., Mortensen, E. L., Callesen, K., and Premack, D., and Woodruff, G. (1978). Does the chimpanzee have a theory of
Gottlieb, D. (2005). The strange stories test. A replication study of children and mind? Behav. Brain Sci. 1, 515–526. doi: 10.1017/S0140525X00076512
adolescents with Asperger syndrome. Eur. Child Adoles. Psychiatry 14, 73–82. Rao, A., and Georgeff, M. (1992). “An abstract architecture for rational agents,”
doi: 10.1007/s00787-005-0434-2 in Proceedings of KR 92: The 3rd International Conference on Knowledge
Katznelson, H. (2014). Reflective functioning: a review. Clin. Psychol. Rev. 34, Representation and Reasoning, eds B. Nebel, C. Rich, and W. Swartout (San
107–117. doi: 10.1016/j.cpr.2013.12.003 Mateo, CA: Morgan Kaufmann), 439–449.
Laghi, F., Cotugno, A., Cecere, F., Sirolli, A., Palazzoni, D., and Bosco, F. Rudden, M. G., Milrod, B., and Target, M. (2005). The Brief Reflective Functioning
M. (2014). An exploratory assessment of theory of mind and psychological Interview. New York, NY: Weill Cornell Medical College.
impairment in patients with bulimia nervosa. Br. J. Psychol. 105, 509–523. doi: Semerari, A., Carcione, A., Dimaggio, G., Falcone, M., Nicolo, G., Procacci, M.,
10.1111/bjop.12054 et al. (2003). How to evaluate metacognitive functioning in psychotherapy? The

Frontiers in Psychology | www.frontiersin.org 11 May 2016 | Volume 7 | Article 566


Bosco et al. Psychometric Properties of TH.O.M.A.S

metacognition assessment scale and its applications. Clin. Psychol. Psychother. Voracek, M., and Dressler, S. G. (2006). Lack of correlation between digit
10, 238–261. doi: 10.1002/cpp.362 ratio (2D: 4D) and Baron-Cohen’s “Reading the Mind in the Eyes”
Semerari, A., Cucchi, M., Dimaggio, G., Cavadini, D., Carcione, A., Battelli, V., test, empathy, systemising, and autism-spectrum quotients in a general
et al. (2012). The development of the metacognition assessment interview: population sample. Pers. Individ. Dif. 41, 1481–1491. doi: 10.1016/j.paid.2006.
instrument description, factor structure and reliability in a non-clinical sample. 06.009
Psychiatry Res. 200, 890–895. doi: 10.1016/j.psychres.2012.07.015 Waugh, R. F., and Chapman, E. S. (2005). An analysis of dimensionality
Serafin, M., and Surian, L. (2004). Il Test degli Occhi: uno strumento per valutare using factor analysis (true-score theory) and Rasch measurement:
la “teoria della mente”. Giornale Ital. Psicol. 31, 839–862. doi: 10.1421/18849 what is the difference? Which method is better?. J. Appl. Measur. 6,
Shrout, P. E. (1998). Measurement reliability and agreement in psychiatry. Stat. 80–99.
Methods Med. Res. 7, 301–317. doi: 10.1191/096228098672090967 Wellman, H. M. (1991). “From desires to beliefs: acquisition of a theory of mind,” in
Sivaratnam, C. S., Cornish, K., Gray, K. M., Howlin, P., and Rinehart, N. J. (2012). Natural Theories of Mind. Evolution, Development and Simulation of Everyday
Brief report: assessment of the social-emotional profile in children with autism Mindreading, ed A. Whiten (Oxford: Blackwell), 19–38.
spectrum disorders using a novel comic strip task. J. Autism Dev. Disord. 42, Wellman, H. M., and Liu, D. (2004). Scaling of Theory-of-Mind tasks. Child Dev.
2505–2512. doi: 10.1007/s10803-012-1498-8 75, 523–541. doi: 10.1111/j.1467-8624.2004.00691.x
Smith, R. M. (1996). A comparison of methods for determining Wimmer, H., and Perner, J. (1983). Beliefs about beliefs: representation and
dimensionality in Rasch measurement. Struct. Equat. Model. 3, 25–40. constraining function of wrong beliefs in young children’s understanding of
doi: 10.1080/10705519609540027 deception. Cognition 13, 103–128. doi: 10.1016/0010-0277(83)90004-5
Taylor, E. L., Target, M., and Charman, T. (2008). Attachment in adults Wright, B. D., and Linacre, J. M. (1994). Reasonable Mean-Square Fit Values. Rasch
with high-functioning autism. Attach. Hum. Dev. 10, 143–163. doi: Measurement Transactions 8:370. Available online at: http://www.rasch.org/
10.1080/14616730802113687 rmt/rmt83b.htm (Accessed 15 April, 2015).
Tirassa, M. (1999). Communicative competence and the architecture of the Yildirim, E. A., Kasar, M., Güdük, M., Ateş, E., Küçükparlak, Ý., and Özalmete, E.
mind/brain. Brain Lang. 68, 419–441. doi: 10.1006/brln.1999.2121 O. (2011). Investigation of the reliability of the “reading the mind in the eyes
Tirassa, M., and Bosco, F. M. (2008). On the nature and role of intersubjectivity in test” in a Turkish population. Turk. J. Psychiatry 22, 177–186.
human communication. Emerg. Commun. Stud. New Technol. Pract. Commun. Yu, C. H., Popp, S. O., DiGangi, S., and Jannasch-Pennell, A. (2007). Assessing
10, 81–95. unidimensionality: a comparison of Rasch modeling, parallel analysis, and
Tirassa, M., Bosco, F. M., and Colle, L. (2006a). Rethinking the ontogeny of TETRAD. Pract. Assess. Res. Evaluat. 12, 1–18.
mindreading. Conscious. Cogn. 15, 197–217. doi: 10.1016/j.concog.2005.06.005
Tirassa, M., Bosco, F. M., and Colle, L. (2006b). Sharedness and Conflict of Interest Statement: The authors declare that the research was
privateness in human early social life. Cogn. Syst. Res. 7, 128–139. doi: conducted in the absence of any commercial or financial relationships that could
10.1016/j.cogsys.2006.01.002 be construed as a potential conflict of interest.
Vellante, M., Baron-Cohen, S., Melis, M., Marrone, M., Petretto, D. R., Masala,
C., et al. (2013). The “Reading the Mind in the Eyes” test: systematic review of Copyright © 2016 Bosco, Gabbatore, Tirassa and Testa. This is an open-access article
psychometric properties and a validation study in Italy. Cogn. Neuropsychiatry distributed under the terms of the Creative Commons Attribution License (CC BY).
18, 326–354. doi: 10.1080/13546805.2012.721728 The use, distribution or reproduction in other forums is permitted, provided the
Velloso, R. D. L., Duarte, C. P., and Schwartzman, J. S. (2013). Evaluation of the original author(s) or licensor are credited and that the original publication in this
theory of mind in autism spectrum disorders with the Strange Stories test. Arq. journal is cited, in accordance with accepted academic practice. No use, distribution
Neuropsiquiatr. 71, 871–876. doi: 10.1590/0004-282X20130171 or reproduction is permitted which does not comply with these terms.

Frontiers in Psychology | www.frontiersin.org 12 May 2016 | Volume 7 | Article 566

View publication stats

You might also like