The ability to attribute mental states (e.g., desires, beliefs) to other individuals (‘Theory of Mind’, ToM) is often assumed to be central to efficient social cognition. Indeed, not knowing what others know can lead to difficulty when interacting socially. A number of studies (e.g., [
1,
2]) have suggested that one particular ToM attribute can occur automatically such that it is rapid and does not require controlled processing. Specifically, these authors posit that humans spontaneously represent the viewpoint or visual perspective of other individuals.
This notion has primarily found support from results obtained in the “dot-perspective” paradigm ([
1]; see also [
3]). In the basic procedure, participants are shown an image of a human avatar positioned in the centre of a virtual room that looks towards a left or right-hand wall. Positioned on the walls are a variable number of dots. For instance, there may be one on the left and two on the right, none on the left and one on the right, or one on each wall. Participants are required to rapidly judge how many dots are present from either their own perspective or that of the avatar. The important manipulation is whether the avatar and participant can see the same number of dots or a different number. For example, when the room contains two dots (only) located on the left wall and the avatar looks towards the left, both the participant and the avatar see the same number (i.e., two). If, however, an additional dot is added to the righthand wall, the participant and avatar now see a different number of dots; the avatar sees the two on the left she is facing, but the participant can see all three. Results typically show that reaction time (RT) to determine the number of dots is shorter when the participant and avatar can see the same number of dots compared to when they see a different number. Importantly, this occurs when participants are asked to perform the dot-number judgment from their own perspective, i.e., when the avatar’s perspective is irrelevant. It is these results that have led researchers to suggest that the perspective of another individual is spontaneously computed by an observer.
Samson and colleagues [
1] also suggested that the mechanisms responsible for the classic gaze cueing effect [
4,
5], in which attention is shifted to a gazed-at location, could provide a mechanism with which involuntary perspective taking occurs. As Samson et al. stated, “It is likely that similar attention cueing effects produced by the avatar’s gaze, head, and/or body orientation contributed to the ease with which the avatar’s visual experience was computed” [
1] (p. 1264). Indeed, if the avatar acts as a directional cue, this would facilitate responses when the only discs in the display are located on the wall looked at by the avatar. By contrast, responses would be slowed when a disc appears on the wall not looked at because now attention would have to reorient away from the gazed-at wall to the other side of the display. However, rather than being seen as a mechanism, any attentional shift induced by the avatar can be considered a confound; Samson and colleagues’ [
1] ‘consistency’ manipulation (of the avatar and participant’s viewpoint) maps directly on to the ‘validity’ manipulation used in standard gaze cueing experiments. In other words, a reflexive shift of attention could explain Samson et al.’s [
1] data without any mentalistic attribution processes taking place. This alternative “directional hypothesis” has recently been examined by Santiesteban et al. [
6]. The authors suggested “that it is the directional, rather than the agentive, features of the avatar that are important, and that they modulate a process that represents the number of dots on one side of the screen, rather than the number that an agent can see”([
6], p. 930). Santiesteban and colleagues [
6] go on to suggest that the shift of attention is induced by the ‘front features’ of the avatar such as the forehead, eyes, and nose. In support of this assertion, the authors presented participants with two sets of trial types in which either the avatar or an arrow was shown in the centre of the display. They found that both stimuli resulted in a consistency/cueing effect. Since an arrow cannot have a perspective or mental state, the authors suggested that results from the dot-perspective task do not show spontaneous perspective taking. Rather, the authors argued, the consistency effect is due to a domain-general process that facilitates the representation of one side of the display, the side gazed towards by the avatar. However, the demonstration that arrows generate a consistency effect does not falsify the claim that spontaneous perspective taking also generates a consistency effect. Put simply, showing that avatars can shift attention will not tell us anything about whether their perspective is taken. This can be no more true than saying that we take the arrow’s perspective in an arrow cueing experiment. Indeed, Santiesteban et al. [
6] effectively argued that a replication of the classic central/arrow cueing effect challenges the spontaneous perspective taking hypothesis. Cole et al., ([
7]; see also [
8]) did attempt to falsify the theory that humans spontaneously compute the perspective of others by placing a physical barrier between the gazing agent and a target. Cole et al. adopted this barrier technique from work examining whether chimpanzees know what another individual can see (e.g., [
9]). Clearly, when this method is employed in the dot-perspective paradigm, no perspective taking-like effect should be observed because the agent cannot see the same thing as the participant. However, Cole et al. [
7] found the same dot-perspective-like data irrespective of whether the avatar could see the dots or not. A related problem for the attentional shift account (although not the spontaneous perspective taking theory) is that any shift induced by the avatar could itself be due to visual perspective taking. Indeed, a number of authors have argued that the classic gaze cuing effect can itself be modulated according to what the gazing agent can see. For instance, Nuku and Bekkering ([
3]; see also [
10,
11]) showed that the size of the gaze cueing effect was smaller when the gaze cue had its eyes closed versus open, or when its vision was blocked out by a dark rectangle versus wearing sunglasses.
The principal aim of the present work was twofold. First, we tested the perspective taking theory by using two different effects as dependent measures. In Experiment 1 observers undertook a classic flanker task [
12], and in Experiment 2 they performed a Simon task [
13]. The spontaneous perspective taking theory predicts that the effects of flankers and spatial compatibility should be modulated according to whether an avatar can see the stimuli that induce these effects or not. In Experiment 3, the avatar was replaced by a gazing agent typical of those used in gaze following experiments and we again manipulated what the agent could see via a physical barrier. This was motivated by the contrasting findings of previous studies in which some report that the perspective of a gaze cue influences gaze following but others do not. Our second aim was to examine the attentional cueing account of Samson et al. [
1] and Santiesteban et al. [
6]. In Experiments 4 and 5, therefore, we employed the Samson et al. [
1] avatars and assessed whether these stimuli are able to shift an observer’s attention laterally as these authors have suggested.
1.1. Experiment 1 Introduction
The implicit assumption of the ‘mentalising’ account proposed by Samson et al. [
1] is that the RT cost on inconsistent trials occurs because there is interference between the internal representation of the number of dots the participant can see, and the internal representation of the number of dots the avatar can see. It is, therefore, reasonable to assume that this should manifest itself in other paradigms in which, crucially, the avatar either sees the same stimuli as the participant or does not. To this end, Experiment 1 required participants to perform a task based on the ‘flanker’ effect. It is well established that when a central letter has to be discriminated, flanking letters influence RT [
12]. The flanker effect is particularly pronounced on the letter congruency version of the task in which the identity of a flanking letter may be different to the target but is part of the participant’s current response set, meaning that it is sometimes a target itself (on other trials). For instance, a participant may be asked to press a left-hand button when the target is an ‘A’ and a right-hand button when the target is a ‘B’. RTs are particularly slow if the target is an ‘A’ and a flanking letter is a ‘B’. Conversely, RTs are particularly fast if both the target and the distractor are the same.
In the present experiment, observers were required to determine the identity of a central letter in the presence of a single peripheral letter. The target was positioned on the shoulder of an avatar located in the centre of the room (see
Figure 1). Crucially, either the avatar faced towards the flanking letter or faced towards the opposite wall. The rationale for this manipulation is that if the perspective of the avatar is spontaneously computed, the flanker effect should be magnified when the avatar is looking at the distractor relative to when the avatar is looking away from the distractor.
Experiment 1 Results
Figure 2 (overleaf) shows mean RTs for each of the four conditions. Outliers (2 SDs) accounted for 4.8% of responses and were omitted from further analysis. An ANOVA with congruency and consistency as within-participants factors revealed a significant main effect of congruency,
F(1, 25) = 374,
p < 0.0001, η
2p = 0.94, but no significant main effect of consistency,
F(1, 25) < 1. The interaction was not significant,
F(1, 25) < 1. Analysis of the error data using the same factors and levels revealed a significant main effect of congruency,
F(1, 25) = 27.2,
p < 0.001, η
2p = 0.52, but no significant main effect of consistency,
F(1, 25) < 1. The interaction was also not significant,
F(1, 25) < 1.
Experiment 1 has revealed a classic flanker effect; RTs and accuracy were compromised when a peripheral (distractor) letter was incongruent with a central target. However, this effect was no smaller when the avatar faced away from the distracting element compared to when it faced towards it. This result does not, therefore, support the view that another person’s perspective is spontaneously taken. At best, these results suggest that the spontaneous perspective taking effect is not sufficiently reliable to generalise beyond the canonical perspective taking task when the stimuli and/or task are slightly modified.
1.2. Experiment 2 Introduction
One of the most robust phenomena of visual cognition is the finding that RTs are reduced when the stimulus to be responded to shares a spatial property with the effector used to respond to that stimulus (the stimulus/response compatibility, or ‘Simon effect’; e.g., [
13]). For instance, responses to a target requiring a left-hand button press will be quicker if the target occurs on the left side of a display as opposed to the right. The stimulus location (i.e., left and right) is, of course, relative to the viewpoint of the observer. However, if observers spontaneously take the perspective of other individuals, then the representation of a stimulus location should be affected by the viewpoint of that individual. For instance, a stimulus located to the left of an observer could be located to the right of an avatar (see
Figure 3, right panel, overleaf). Thus, as with Experiment 1, it is reasonable to assume that the spontaneous perspective taking theory predicts that the effect this should occur in other paradigms in which the avatar either sees the same stimuli as the participant or does not. In the present experiment participants undertook a variant of a standard Simon task. Importantly, in half of the trials the stimulus to be discriminated appeared on the same side of the display with respect to both the participant and the avatar (e.g., left side for both). On the other half, the stimulus was on one side relative to the participant (e.g., left), but on the other side (e.g., right) relative to the avatar. The spontaneous perspective taking theory predicts that RTs will be shorter in the former condition relative to the latter because the position (and, thus, perspective) of the critical stimulus with respect to left and right is the same for the avatar and participant.
Experiment 2 Results
Figure 4 presents mean RTs for each condition. Outliers (2 SDs; 4.1%) were again omitted from further analysis. An ANOVA with compatible and incompatible as within-participant factors revealed a significant main effect of compatilbility,
F(1, 25) = 14.8,
p < 0.001, η
2p = 0.37, but no significant main effect of consistency,
F(1, 25) < 1. The interaction was not significant,
F(1, 25) < 1.
Error data revealed a small, although non-significant, main effect of compatibility, F(1, 25) = 3.9, p < 0.06, η2p = 0.13, and consistency, F(1, 25) = 3.3, p < 0.08, η2p = 0.11. The interaction was not significant, F(1, 25) < 1. Overall, Experiment 2 has shown a classic Simon effect; RTs were reduced when the target was located on the same side as the response required. However, this effect was not influenced by the location of the target with respect to the avatar’s viewpoint. As with Experiment 1, this is not consistent with the hypothesis that observers spontaneously took the perspective of the avatar.
1.3. Experiment 3 Introduction
The notion of spontaneous perspective taking has not only come from the dot-perspective task. Results from the classic gaze cueing paradigm have also been argued to be due to, or at least modulated by, visual perspective taking. For instance, Nuku and Bekkering [
3] pointed out that previous studies had not examined whether gaze cueing occurs as “a consequence of
observing the others’ gaze direction or a consequence of inferring the others’ attended location” (p. 340). By manipulating whether the gazing agent’s eyes were open or closed, the authors went on to show that the gaze cueing effect only occurs “where the agent is believed to be attending to the object” (p. 340). Thus, ‘believing’ what the agent can or cannot see is clearly invoking ToM processes. However, using the barrier technique described in the Introduction above, Cole et al. [
8] showed that the gaze following effect still occurs when the gaze cue cannot see the targets. This was observed when the gazing agent was both a real, physically present, person who was sat opposite the participant and when it was a photograph of a person presented on a monitor. Given the contradictory findings of previous work, the present Experiment 3 again examined the visual perspective account, this time with specific reference to the gaze cueing effect. We employed a schematic representation of a face (
Figure 5), typically employed in this paradigm, together with the barrier manipulation. In half of the trials, the gazing agent could see the two lateral walls and, hence, targets. However, on the other half, the window-like structures were blocked, thus, preventing visibibility of the walls. As previously, if the computation of the gazing agent’s visual perspective underlies the gaze cueing effect, no such effect should occur when the agent cannot see the targets.
Experiment 3 Results
Of the responses, 4.2% were outliers (2 SDs) and were omitted from further analysis.
Figure 6 shows mean RTs for each of the six conditions. An ANOVA with validity (valid, invalid, or neutral) and visibility (seeing or non-seeing) as within-participants factors revealed a significant main effect of validity,
F(2, 82) = 3.5,
p < 0.05, η
2p = 0.078, but no significant main effect of visibility,
F(1, 41) < 1. The interaction was not significant,
F(2, 82) < 1.
The first notable aspect of these results is the presence of an attention cueing effect. Participants were faster to identify the target when it appeared in the cued relative to uncued location. This replicates the many previous reports of eye gaze triggering a shift in an observer’s attention (e.g., [
4,
5]). Crucially however, is the finding that this effect was not influenced by what the gazing agent could see. That is, a gaze-following effect was observed even when the agent could not see the targets. As with Experiments 1 and 2 this does not support the spontaneous perspective taking theory. In sum, Experiments 1–3 manipulated consistency between what the participant and a gazing agent could see. Results have shown no modulation of a basic visual cognition phenomenon (i.e., the flanker effect; the Simon effect; gaze-following) based on whether a gazing agent could see the critical stimuli or not. In our final two experiments, we examine an alternative explanation for the results that have been attributed to automatic perspective taking.
1.4. Experiment 4 Introduction
Experiments 4 and 5 examined the claim of Samson et al. [
1] and Santiesteban et al. [
6] that avatar-induced shifts of attention (in the dot-perspective paradigm) contribute or, indeed, generate the consistency effect. Recall that, for Samson et al. [
1], such a shift provides a mediating mechanism with which spontaneous perspective taking occurs, whereas for Santiesteban et al. [
6] it is the explanation. In Experiment 4 we carried out a close replication of a standard central cueing experiment in which the avatar employed previously (i.e., the present Experiments 1 and 2; [
1,
6];) was used as the cueing stimulus. Thus, a single target appeared in either the looked-at direction or in the opposite hemifield (see
Figure 7). As with Experiment 3, we again manipulated whether the avatar could see the lateral walls or not with the use of barriers. This manipulation enabled us to again examine the perspective taking theory, in addition to the attentional shift hypothesis, since no effect should occur when the avatar cannot see the target. Since the effects of central cues are thought to require some time to build up [
14], we additionally employed a cue-target interval of 100 ms, allowing a relatively liberal test of the directional hypothesis. Note that this interval may still be too short to allow any shift to occur. For instance, Bukowski et al. [
2] and Gardner et al. [
15] showed that an interval of 300 ms or longer may be needed. However, intervals of this magnitude increase the likelihood of top-down proceeses modulating any effect. This, by definition, could mean that the phenomenon is not ‘automatic’ or ‘spontaneous’.
Experiment 4 Results
Outliers (2 SDs) were removed, accounting for 4.1% of the data. One observer was removed from further analysis due to an error rate of more than 20%.
Figure 8 (overleaf) shows mean RTs. An ANOVA with validity (valid or invalid) and visibility (seeing or non-seeing) as within-participant factors revealed no significant main effect of validity,
F(1, 31) = 1.1,
p > 0.3, or visibility,
F(1, 31) = 2.2,
p > 0.14. The interaction was also not significant,
F(1, 31) = 0.59,
p > 0.44. With respect to the error data, there was no significant main effects of validity,
F(1, 31) = 0.7,
p > 0.4, or visibility,
F(1, 31) = 2.5,
p > 0.11. The interaction was also not significant,
F(1, 31) = 0.7,
p > 0.38.
Overall, the results from Experiment 4 reveal that the avatars employed in the present and previous works are not able to shift attention to the side. Furthermore, the absence of a cueing effect was apparent in both visibility conditions. Not only do these data fail to support the perspective taking account (i.e., RTs were not facilitated when the avatar could see the target), but they also fail to support the existence of a suggested mechanism [
1,
6] with which automatic perspective taking could occur, i.e., attentional cueing.
1.5. Experiment 5 Introduction
Although Experiment 4 did not provide evidence that the kind of avatar previously employed shifts attention, one could argue that this conclusion is weak because it rests on a null effect; perhaps our experimental setup was not sensitive enough to reveal an attentional shift if it exists. Furthermore, although Experiment 4 (and Experiment 3) included an avatar-target interval of 100 ms in an attempt to assist the generation of a cueing effect (see Methods below), one could argue that no such interval should be included since Samson et al. [
1] and Santiesteban et al. [
6] did not include one. Moreover, these authors did not include barriers in their displays.
In our final experiment we directly compared the cueing ability of the avatar used previously with that of a stimulus known to induce attentional shifts. An abundance of work has demonstrated that a luminance change, and/or object onset, that occurs shortly before a target, is particularly effective at marshalling attention [
14,
16]. If our method is not sensitive enough to induce/measure attentional shifts we should not find a cueing effect for both the avatar and luminance cues. If, by contrast, we observe attentional cueing with at least one of our cues we can be confident that our paradigm is indeed sensitive to index such a shift.
Experiment 5 Results
Outliers (2 SDs) were removed accounting for 4.3% of the data.
Figure 9 shows the mean RTs. An ANOVA with validity and cue-type as within-participant factors revealed a significant main effect of validity,
F(1, 27) = 15.9,
p < 0.001, η
2p = 0.37, and cue type,
F(1, 27) = 22.6,
p < 0.001, η
2p = 0.46. The interaction was also significant,
F(1, 27) = 12.1,
p < 0.002, η
2p = 0.31. Simple analyses revealed that the interaction was due to a cueing effect occurring in the luminance-cue condition,
t(27) = 4.4,
p < 0.001, but not in the avatar-cue condition,
t(27) = 0.89,
p > 0.38. With respect to errors, an ANOVA using the same factors and levels described above revealed a non-significant main effect of validity,
F(1, 27) = 2.2,
p < 0.16, η
2p = 0.08, and cue type,
F(1, 27) = 2.8,
p < 0.12, η
2p = 0.1. The interaction was, however, significant,
F(1, 27) = 6.2,
p < 0.02, η
2p = 0.19. Simple analyses revealed a significant reduction of errors in the luminance-cue condition,
t(27) = 2.4,
p < 0.05, but not in the avatar-cue condition, t(27) = 0.69,
p > 0.49.
Overall, these data confirm the results of many previous attentional orienting studies; the onset of a luminance cue is effective at marshalling attention. By contrast, the avatar was not able to shift attention, thus supporting the results of Experiment 4. This, in turn, shows that our procedure is sensitive enough to index any attentional orienting, if one exists.