LinguaPix Database 2021
LinguaPix Database 2021
https://doi.org/10.3758/s13428-021-01651-0
Abstract
The major aim of the present megastudy of picture-naming norms was to address the shortcomings of the available picture data
sets used in psychological and linguistic research by creating a new database of normed colour images that researchers from
around the world can rely upon in their investigations. In order to do this, we employed a new form of normative study, namely a
megastudy, whereby 1620 colour photographs of items spanning across 42 semantic categories were named and rated by a group
of German speakers. This was done to establish the following linguistic norms: speech onset times (SOT), name agreement,
accuracy, familiarity, visual complexity, valence, and arousal. The data, including over 64,000 audio files, were used to create the
LinguaPix database of pictures, audio recordings, and linguistic norms, which to our knowledge, is the largest available research
tool of its kind (http://linguapix.uni-mannheim.de). In this paper, we present the tool and the analysis of the major variables.
Keywords Picture-naming norms . Picture database . Speech onset times (SOT) . Familiarity . Visual complexity . Valence .
Arousal
Introduction the world’, and it is the basis for almost all human activity
(Binder & Desai, 2011, p. 1), the importance of pictures as
Pictures are very often utilised as stimuli in psychological and experimental stimuli in psycholinguistic research cannot be
linguistic research. They are used in a wide variety of exper- understated.
imental tasks, such as picture naming, translation, and the Existing picture databases, however, have several limita-
visual world paradigm. In a picture-naming paradigm, for in- tions: usually, the number of items included in the data sets is
stance, participants are asked to name, as quickly and accu- relatively small; the most commonly used items are black and
rately as possible, pictures that are shown in succession on a white line drawings (which when used in experiments can be
computer screen, while their reaction times and error rates are informative only about a part of human visual processing); the
recorded. This type of task appears to be very simple, yet it is response data are limited to response times; and most of the
very useful, as pictures are believed to activate underlying picture databases have norms in a single language. In order to
semantic information (Altarriba & Basnight-Brown, 2009). go beyond the current limitations, we developed an entirely
In other words, the picture-naming task allows for drawing new database of colour photographs, with audio naming
conclusions about the way in which semantic information is response data, and norms on four attributes.
processed and represented in memory. Given that ‘semantic We opted for colour photographs, as these have been
memory is one of our most defining human traits, shown to influence cognitive naming processes at the initial
encompassing all the declarative knowledge we acquire about stage of visual identification. In this regard, both Rossion and
Pourtois (2004) and Bonin et al. (2019), who compared line
drawings, pictures with added grey-level texture, and
* Agnieszka Ewa Krautz colourised images, demonstrated that colour information sig-
[email protected] nificantly contributes to accuracy and naming speed by ap-
proximately 100 ms.
Emmanuel Keuleers Moreover, besides commonly collected norming informa-
[email protected]
tion on familiarity, visual complexity, or name agreement, we
1
Department of English Linguistics, University of Mannheim, Schloss chose to collect ratings of valence and arousal. This was mo-
EW 274, 68161 Mannheim, Germany tivated by the fact that the existing affective picture databases,
2
Department of Cognitive Science and Artificial Intelligence, Tilburg e.g. the International Affective Picture System (Lang et al.,
University, Warandelaan 2, 5037 AB Tilburg, the Netherlands 1997) or the Geneva Affective Picture Database (Dan-
Behav Res
Glauser & Scherer, 2011), include only limited norming in- An alternative set of pictures providing a more realistic and
formation on neutral images, which have the power to induce ecologically valid representation of real-life objects was created
non-negative emotions. By including valence and arousal rat- by Moreno-Martínez and Montoro (2012). It consists of 360
ings of common everyday objects in the current study, we high quality colour images that belong to 23 semantic subcat-
aimed at establishing a useful baseline comparison for images egories, e.g. fruit, animals, vehicles, clothes, etc. The normative
preliminarily defined as positive or negative. data include information about age of acquisition, familiarity,
Furthermore, the importance of providing reliable norms manipulability, name agreement, typicality, and visual com-
on familiarity, visual complexity, valence, and arousal relates plexity. Nevertheless, the norms were only collected in
to the fact that the four variables have also been shown to Spanish, and overall, the number of images is still quite low.
influence image processing. That is, familiarity has been re- To address some of the limitations of the smaller data
ported to correlate negatively with naming speed (e.g. sets, the open source Multilingual Picture (MultiPic) da-
Johnston et al. (2010) reported r = −.433). Regarding visual tabase (Duñabeitia et al., 2018) was recently released with
complexity, Snodgrass and Vanderwart (1980) and Rossion 750 drawings that were normed across six languages, in-
and Pourtois (2004) noted that a higher degree of image com- cluding British English, Spanish, French, Dutch (from
plexity might slow down image processing. However, this Belgium and the Netherlands), Italian, and German.
finding has not been confirmed by Perret and Bonin (2019) Over 600 native language speakers were requested to
in their Bayesian meta-analysis. Finally, the impact of affec- name the pictures (in typing) and rate their visual com-
tive variables on image is well established. For instance, ac- plexity on a Likert scale. The researchers established a
cording to the Automatic Vigilance Hypothesis (Pratto & high degree of convergence for naming in both within-
John, 1991), negative stimuli lead to delayed disengagement and between-language conditions. Currently, however,
and thus, slower responses in recognition tasks (Estes & MultiPic provides two norms and includes colour draw-
Adelman, 2008). ings of objects, which again restricts their usability in
In what follows, we first review the existing picture data experimental settings.
sets. Next, we move on to a discussion of the relevant In our view, the most comprehensive database of pictures
megastudies which inspired the methodology used in this pro- that is currently available is the Bank of Standardised Stimuli
ject. Finally, we present the experimental tasks that were ad- (BOSS), with norms in American English (Brodeur et al.,
ministered as well as the initial findings established on the 2010; Brodeur et al., 2014) as well as a subset of items avail-
basis of the German data. able in Canadian French (Brodeur et al., 2012). BOSS in-
cludes 1410 photo stimuli normed for name, semantic catego-
ry, familiarity, visual complexity, object agreement, view-
Picture data sets point agreement, and manipulability. Furthermore, the images
are available in several versions, including greyscale, blurred,
The rise in popularity of pictures as a research tool in psycho- scrambled, and line drawings. This large set of images is an
linguistics has not been matched by an increase in the quantity excellent source of experimental stimuli, but it is currently
and quality of available stimuli. Many studies still rely on the limited to two languages.
black and white line drawings that were first developed by Finally, it is important to acknowledge the state-of-the-art
Snodgrass and Vanderwart (1980). This set of pictures with platforms in object recognition, such as the Microsoft COCO:
norms for naming agreement, image agreement, familiarity, Common Objects in Context database (Lin et al., 2014) or the
and visual complexity consists of just 260 images. A salient ImageNet database (Deng et al., 2009). They contain millions
characteristic of these pictures is that they are black and white of annotated entries with images of varied quality embedded
drawings, which may be processed differently than images in the context of a visual scene. Certainly, in comparison to
that are more realistic. The images from Snodgrass and COCO or ImageNet, the current study and the LinguaPix da-
Vanderwart (1980) were given a makeover by Rossion and tabase are small-scale. However, the fact that images in the
Pourtois in 2004. They were coloured and a new archive, two databases are embedded in a context and are of varying
which includes 24-bit colour images of 209 objects, was cre- quality is very useful for artificial image recognition, although
ated. In addition, normative data regarding the same four var- this makes them less appropriate for experimental research.
iables as in the original investigation were included. The com-
parison of the two data sets allowed Rossion and Pourtois
(2004) to demonstrate that black and white line drawings at- Megastudy as a research tool
tract lower recognition rates in comparison to colour images.
Despite the quality of the pictures having been improved, the In the current study, 64,000 audio responses were recorded in
number of images in Rossion and Pourtois’ set is still relative- German and the speech onset times (SOT) of these responses
ly small. have been made available in the database. The quantity and
Behav Res
scope of collected response data, in conjunction with its pur- were born in Germany and resided in this country at the time
pose of maximising utility and reusability, would qualify this of the data collection. For all of them, German was their first
as a megastudy (Keuleers & Balota, 2015; Keuleers & and native language; however, they all spoke at least one
Marelli, 2020). foreign language. In addition, 15 of them reported speaking
Seidenberg and Waters (1989) were the first to use the term two foreign languages fluently, and four reported having
megastudy to refer to the voice onset times that they collected knowledge of three.
based on 3000 monosyllabic English words. The studies that
followed substantially increased the number of stimuli and the Stimuli
amount of data being collected. One of the first important
examples of a megastudy was the English Lexicon project The initial stage of stimulus preparation involved creating lists
(Balota et al., 2007), which involved compiling lexical deci- of items from different semantic categories that could be
sion and naming data for over 40,000 words. This initial in- photographed. We opted for stimuli that were concrete and
vestigation gave rise to a number of variants: the French imageable. Abstract notions, actions, and properties were ini-
Lexicon project (Ferrand et al., 2010), the Malay Lexicon tially considered, but were not included in the final list due to
project (Yap et al., 2010), the Dutch Lexicon project difficulties in capturing such items in a photograph. We arrived
(Keuleers et al., 2010), and the British Lexicon project at over 1600 items spanning across 42 semantic categories
(Keuleers et al., 2012), each providing data about several including, inter alia, animals, plants, toys, professions, musical
thousand words and pseudowords. The megastudy approach, instruments, food, furniture, clothing and accessories, vehicles,
however, has not just been limited to word recognition. In buildings, stationery, and mythical creatures (Table 1). Next,
recent years, the approach has been applied to semantic prim- over several months a student photographer took photos of the
ing (Hutchison et al., 2013), masked priming (Adelman et al., requested items. Each object was photographed on its own on a
2014), and even the processing of sentences by monolingual homogenous background, either green or white, at a resolution
and bilingual speakers (GECO database by Cop et al., 2017). of 300 dpi. Subsequently, each photograph was edited. First,
For the present study, the number of stimuli (1620) was com- the ClippingMagic tool (https://clippingmagic.com) was used
paratively small, but the responses elicited from 40 German- to remove the initial background and situate the object on a
speaking participants resulted in a very large data set of audio consistent white background. Then, the GIMP image editor
files, and thus we have grounds to classify it as a megastudy. (https://www.gimp.org/) was employed to remove any visible
Before the data set is presented, the overall aims and the meth- brand names, logos, or text, adjust the light, and resize the
odology used are described in the section below. images. Part of the photo editing process is depicted in Fig. 1.
The above-described procedure resulted in an initial set of 1220
photographs, examples of which are included in Fig. 2. It was
Present study not possible, however, to photograph several target items, e.g.
different animals, sea creatures, or fairy tale characters. To ad-
The present study has the aim of addressing the limitations of dress this issue, we purchased a set of 400 images from 123Rf
the above-discussed picture data sets. Not only are many im- (https://www.123rf.com/), which is a stock photo provider. To
ages, in the form of colour photographs, evaluated, but also— ensure the highest level of copyright protection, the legal
and importantly—the audio recordings of the naming data are department of the university, where the data were collected,
used to establish SOT. The naming data are also used to derive drew up an individualised agreement with the image provider.
the measures of naming agreement and accuracy. Finally, the The final list of items included 1620 photographs.
rating data regarding familiarity, visual complexity, valence,
and arousal are used to establish four linguistic norms. The Picture-naming experiment
resulting database of pictures, audio recordings, and linguistic
norms will serve as a resource for the psycholinguistic re- Once the images had been prepared, we used them to design a
search community. picture-naming experiment. Stimulus display and recording of
responses were performed using EPrime 2.0 software
(Schneider et al., 2002). Given the large number of items that
Method had to be named, the experiment was split into five smaller
sub-experiments. Each experiment started with a short prac-
Participants tice session, which included three items. Next, the experimen-
tal part began, whereby each image was presented individual-
A group of 40 German native speakers took part in the study, ly on the screen in a randomised order for a duration of 3000
all being university students between the ages of 18 and 26 ms. The participants were instructed to provide a single word
(M = 22.2, SD = 2.8). The majority (29) were female. They for each picture as soon as possible or to refrain from
Behav Res
Table 1 List of the semantic categories and the numbers of items within scale are given for familiarity, visual complexity, valence, and arousal.
each category that were photographed, including information about the Accuracy and name agreement are presented in percentages
main variables. Mean values and SD, in brackets, on a 6-point-Likert
No. Semantic category No. of photos Familiarity Visual complexity Valence Arousal Accuracy Name agreement
providing the name, if they could not recognise or were not full sentences (e.g. It is an apple.). Furthermore, since all
familiar with the depicted object. They were also advised to responses were audio recorded, to extract the information
avoid articles (e.g. the apple), adjectives (e.g. green apple), or about SOT, the participants were requested not to use
Behav Res
a way that the participants could navigate through the entire names were most prominent and therefore, both were included
task by themselves. They could save parts of their responses in the database as a target name and an alternative one. The
and return to the task at a time or location convenient to them, overall level of name agreement between the participants was
as long as they had access to the Internet. For their effort and relatively high; it was equal to 79% (± 23%). This level of
time, each participant was reimbursed €60 after completing all name agreement is higher than that, for example, reported in
parts of the study. the BOSS databases, standing at 64% for the first set and
59.5% for the second. The level we elicited resembles the
information from normative data sets of line drawings that
Results reported agreement between 72% and 85% (Bates et al.,
2003). Next, entropy (H) was calculated on the probability
From the 40 complete response sets, data from 38 participants distribution of alternative names. On average, normalised en-
were submitted for the final analysis. Data from two partici- tropy was 0.69 (SD = 0.70), reflecting a relatively high level
pants had to be removed as one of them had a very high of naming agreement between the German participants.
percentage of incorrect responses or no responses given in Reported levels mirror those reported, for example, by
the naming task. Data from the second participant contained Snodgrass and Vanderwart (1980) 0.56 (±0.53) or Bates
lengthy and descriptive responses rather than actual naming of et al. (2003) from 0.67 (±0.61) to 1.16 (±0.79). Because H
individual objects. Furthermore, three individual data sets, i.e. increases with the number of alternatives supplied, which cru-
Ex 4 p. 118, Ex. 4 p. 134, and Ex. 3 p. 137, were not consid- cially depends on the number of participants, we also included
ered due to technical problems that occurred during the data a normalised entropy measure, in which H is divided by the
collection process which prevented EPrime from saving the maximum entropy (Hmax) for a given number of alternatives,
files correctly. Finally, the following items were removed, as a as shown in the equation below. A histogram capturing the
large proportion of participants found them especially difficult distribution of normalised entropy is shown in Fig. 3.
to name: wine stopper (20 speakers), walking stick (20), tofu
(16), soba noodles (15), seaweed (16), ring binding (18),
H n pðx Þlog ðpðx ÞÞ
i i
razor (17), powder (16), pipe (19), pipe brush (15), pencil ¼−∑ b
case (17), paper stand (18), paper clip (19), paper clip H max i¼1 logb ðnÞ
remover (16), milk frother (17), luggage scale (15), lemon
The accuracy refers to the proportion of correct responses
peeler (17), inhalator (15), hinge (18), heater (16), fringe
provided for each photograph. For example, an image of a
(16), fish and chips (15), durian (10), dragon fruit (16), diablo
hand mixer elicited the following labels: Handmixer, Mixer,
(15), couscous (17), cone (18), cocktail stirrer (18), clips (18),
Handrührer, or Handrührgerät; all of which were considered
chisel (15) and camping gas (15). The lack of responses in
correct, but only the most frequently used ones were treated as
these cases might have been related to genuine unfamiliarity
the modal names, in this case the Handmixer and Handrührer.
with the item or difficulties in recognising it due to problems
The accuracy rate across the final 1547 images was equal to
with its depiction, e.g. an image of tofu. All further analyses
80% (± 22%). The semantic category of shape and colour
were performed on the truncated data.
returned the lowest accuracy rates (64% and 69%, respective-
ly), despite the fact that in the colour category, we also treated
Name agreement and accuracy
focal colour terms as correct. That is, the semantic category of
colour comprised 70 unique hues presented as coloured stains.
To establish the measure of name agreement and accuracy, we
drew a random sample of the audio data from 10 participants
from each experiment (16,000 .wav files), which were then
manually transcribed and coded by two research assistants.
The following two codes were used: 1 stood for a correct
and complete word and 0 was entered for incorrect answers,
incomplete ones, or no answer. Synonyms, near-synonyms
(e.g. Klebeband or Kreppband for adhesive tape), and the
superordinate of the category (e.g. flower instead of rose) were
accepted as correct. This information allowed computing of
the modal name for each image, which was the most frequent-
ly reported name for a particular image. That is, if the name
agreement value was equal to 80%, eight participants out of
ten (based on the amount of transcribed data) had provided the
same word for the image. In many cases, however, two target Fig. 3 The distribution of H of answers divided by the H max
Behav Res
Hardly any participant made a distinction between peripheral Converted to a five-point scale, this becomes 2.48, which is
terms, such as crimson, ruby, or red, but rather referred to all lower than that what was reported by Snodgrass and
these shades as red, which was scored as a correct answer. The Vanderwart (1980), 3.0 (SD = .9) and similar to the mean rat-
categories of nuts (71%), vegetables (72%), and tools (73%) ings from BOSS parts one and two, i.e. 2.4 (SD = .4). A
also had relatively lower accuracy rates. On the other hand, the Kolmogorov–Smirnov test showed that the visual complexity
categories of insects, professions, animals, marine creatures, variable was not normally distributed, D = .05, p = .000; it was
and vehicles returned above 90% accuracy. positively skewed. The mean valence ratings were equal to
3.76 (SD = .01) and arousal ratings to 2.58 (SD = .01).
Familiarity, visual complexity, valence, and arousal Converted to a seven-point scale, the mean valence rating
ratings was 4.31 and the mean rating for arousal was 2.89, which
allows for comparison to the Open Affective Standardised
The rating data on familiarity, visual complexity, valence, and Image Set (OASIS) (Kurdi et al., 2017). They reported a similar
arousal were aggregated across the participants and items. The mean value of 4.33 (SD = 1.10) for valence, but a higher mean
overall distribution of each variable, excluding outliers com- value of 3.66 (SD = 1.68) for arousal. Two Kolmogorov–
prising 0.5%, which were replaced with mean values, is pre- Smirnov tests performed on the variable of valence and arousal
sented in Fig. 4. The mean familiarity ratings were equal to revealed that both factors are not normally distributed,
4.63 (SD = .02). Converted to a five-point scale, this becomes Dvalence = .04, p = .000 and Darousal = .07, p = .000.
3.9, which is higher than the score reported by Snodgrass and As an estimate of the reliability of the average ratings for
Vanderwart (1980), i.e. 3.3 (SD = 1.0), but closer to the average items, we computed the intraclass correlation coefficient ICC
scores from the first BOSS database, i.e. 4.0 (SD = .4) (Brodeur (C, k) for each of the variables (McGraw & Wong, 1996;
et al., 2010) as well as the second BOSS, 4.16 (SD = 0.55) Shrout & Fleiss, 1979). For all of the rated variables, reliabil-
(Brodeur et al., 2014). A Kolmogorov–Smirnov test for nor- ity was high: 0.94 for familiarity, 0.89 for visual complexity,
mality returned a statistically significant result, D = .09, 0.92 for valence, and 0.90 for arousal. The relationships be-
p = .000, which does not confirm the normality of the data. tween all four variables are shown in Fig. 5. Furthermore,
The distribution was negatively skewed. The average visual significant linear relationships were found between all the
complexity rating in our study was 2.86 (SD = .01). variables investigated. A weak but significant negative
Fig. 4 Distribution of mean familiarity, visual complexity, valence, and arousal ratings. The 1 to 6 scales correspond to: 1 - unfamiliar, 6 - familiar; 1 -
very simple, 6 - very complex; 1 - negative emotion, 6 - positive emotion; 1 - not intense, 6 - very intense
Behav Res
Familiarity
Familiarity
Familiarity
Visual complexity Valence Arousal
Visual complexity
Visual complexity
Arousal
Valence Arousal Valence
Fig. 5 Relationships between familiarity, visual complexity, valence, and arousal ratings
correlation was shown between familiarity ratings and visual accuracy values returned statistically significant positive cor-
complexity ratings, r = −.170, p = .000, implying that more relations at the 0.01 level (two-tailed) between all but one pair
familiar images are also less visually complex ones. This find- of factors, that of visual complexity and accuracy (r =−.013,
ing is confirmatory of what Snodgrass and Vanderwart (1980) p = .616). The correlation coefficients of the pairwise relations
demonstrated. Their analysis based on 260 line drawings are given in Table 2 below. The results demonstrate that name
returned a significant negative correlation of r = −.466. agreement and accuracy were higher for those images that
Furthermore, Pearson’s correlation between familiarity and participants were familiar with, those that were visually more
valence as well as familiarity and arousal returned statistically complex, as well as those that had evoked positive emotions
significant positive correlations, respectively r = .508, of higher intensity.
p = .000 and r = .430, p = .000. This implies that photos that
were judged as being more familiar were also seen as being
Speech onset times
more positive and more arousing. Next, the comparison of the
visual complexity rating with valence and arousal proved to
The detection of SOT was performed with the automated
be statistically significant, with weak positive correlations re-
Chronset tool (Roux et al., 2017). Before the SOT were
ported in both cases, r = .134, p = .000 and r = .327, p = .000.
analysed, the data were prepared in the following way.
More visually complex images were judged as being slightly
Responses outside of two standard deviations from the partic-
more positive on the valence variable and more arousing.
ipant’s mean across all five naming experiments were treated as
Finally, a moderate positive correlation can be seen between
outliers and were removed from further analysis (5.6%). In
valence and arousal, r = .569, p = .000. Rather counterintui-
addition, items that were not named (11%) and hence, produced
tively, images that are more positive were rated to be more
no SOT, were not considered. This procedure allowed for es-
arousing. This finding is in conflict with that reported by, for
tablishing a mean naming speed per participant across the final
example, Kurdi et al. (2017), who showed the lack of a statis-
tical relationship between valence and arousal, r = .06,
p = .081. However, Warriner et al. (2013) found a positive Table 2 Correlation coefficients of the pairwise relations between all
correlation between arousal and valence for positive words rating scales, name agreement and accuracy
and a negative correlation for negative ones. Since the propor- Accuracy Name agreement
tion of negatively valenced photographs in the present data set
is relatively small, the present finding could be attributed to Familiarity .412** .264**
undersampling of low-arousal positive and negative images. Visual complexity −0.013 .077**
Valence .194** .113**
Arousal .244** .139**
Rating scales, name agreement, and accuracy Accuracy 1 .332**
Name agreement .332** 1
An analysis of the four rating scales (familiarity, visual com-
plexity, valence, arousal) and the name agreement and **Correlation is significant at the 0.01 level (two-tailed)
Behav Res
Discussion
Fig. 7 Relationships between familiarity, visual complexity, valence, arousal ratings, and SOT
Behav Res
Note: The final column indicates the effect size (η2 ) for each term
The rating data and the SOT collected online and in the Our results show faster picture naming with increasing va-
picture-naming experiment have revealed several interesting lence and arousal. In the case of valence, this pattern is con-
patterns. For instance, linear relationships were observed be- sistent with findings presented by e.g. White et al. (2016), who
tween all rating variables. Familiarity correlated in a negative reported slower naming for negative pictures. On the other
way with visual complexity, but had a positive relationship hand, De Houwer and Hermans (1994) found no difference
with valence and arousal. The correlations between visual between positive and negative words in picture naming. In the
complexity and both valence and arousal turned out to be few studies that have looked at the effect of valence and
positive. Finally, when valence and arousal were compared, arousal on picture naming, Blackett et al. (2017) reported that
a positive relationship between the two variables was ob- both positive and negative pictures with high arousal were
served. In addition, the data from the analysis of variance named slower than neutral stimuli with lower arousal.
revealed that all rating variables contributed to explaining To the extent that word naming and picture naming can be
SOT variance, albeit to varying degrees. Familiarity with the considered similar, our results for valence are compatible with
image was most discriminant of the SOT, followed by arousal the analysis of Kuperman et al. (2014), who re-analysed a
and visual complexity, and valence to a lesser extent. series of influential studies (Estes & Adelman, 2008; Kousta
et al., 2009; Larsen et al., 2008) and showed that, for words name in the allowed time, which resulted in 11% of the SOT
within the same frequency range, negative ones are recognised not being available. In addition, since the images were pre-
more slowly than positive ones. On the other hand, Kuperman sented in a random order and the participants were not familiar
et al. also found that less arousing words are recognised faster with the range of items being depicted, this might have influ-
than more arousing ones, which is the opposite of the pattern enced the precision of their answers. That is, if, for example,
we have demonstrated. These similarities and discrepancies an image of a hazelnut appeared first, it would often attract the
invite more thorough analyses of our results. name nut. Only when the participants came across peanuts,
The analysis reported in this manuscript is certainly not Brazil nuts, etc., did they start to differentiate between the
exhaustive. We focused mainly on presentation of the major names, despite the fact that they were instructed to be specific
relationships between the variables. Further analysis is in naming. Finally, items such as mustard, toothpaste, liquid
planned that will (1) incorporate the demographic variables, soap, hair spray, and shaving foam proved rather challenging
(2) compare the cross-linguistic data from the additional four to be named without any additional clue regarding the name of
languages, and (3) contrast the available data sets from recog- the product or the brand. Often shaving foam was referred to
nition of photographs with recognition of black and white line as hair foam, hair spray ended up being a spray paint, and
drawings, coloured drawings, and the recognition of words. mustard was simply named a tube.
Since line drawings often resemble prototypical representa- Despite several caveats, we anticipate a variety of use cases
tion and photographs are individualised depictions of items, for the data collected in this study, adding methodological
a processing difference is to be expected. Finally, a compari- variety and richness and thus, offering new avenues for re-
son of the processing times of photographs and words can search. A first area is replication: existing experiments for
further aid the discussion regarding the visual and lexico- which picture-naming times were the dependent variable can
semantic stages of recognition. be reanalysed using the SOTs to photographs from the current
We recognise several limitations that the current study study. In a similar way, studies that have used ad hoc ratings
faced. One of the issues relates to the experimental design for familiarity, visual complexity, valence, and arousal can be
and the fact that the images were presented on the computer re-evaluated using the rating data collected here. A second
screen for a duration of 3000 ms. In the case of infrequent or area is the investigation of new research questions: instead
unusual items, participants did not manage to retrieve the of setting up an experiment to collect new data, researchers
Behav Res
can check whether the data they would want to collect are & don’t use full sentences (e.g. It is an apple) but singe
already available. This applies to both the SOTs and the rating words/nouns (e.g. apple)
scales. A related application lies in stimulus selection for other & try to avoid coughing, yawning, sneezing, if possible
fields, such as memory research. In the field of psycholinguis- & the photographs of coloured ‘powder’ require you to pro-
tics, the data can also offer insights into the differences in duce the name of the colour
processing photographic and pictorial representations of the & be specific but use the first word that comes to your mind
same concepts. Finally, researchers in artificial intelligence & name the items as fast as possible
may be interested in using the data to train picture-to-word & if you don’t know or don’t remember the name of the item
recognition models or to train speaker identification ones. don’t say anything
To address the shortcomings of the extant picture-naming da- Appendix 2 – Instructions – German online
tabases, we have conducted a megastudy of picture-naming rating task
norms. A group of German native speakers named and eval-
uated over 1600 colour images on measures of familiarity, In der folgenden Umfrage bitten wir dich darum, jedes Bild
visual complexity, valence, and arousal. This allowed for es- anhand von vier Maßstäben zu bewerten: (1) Vertrautheit (wie
tablishing the norms of name agreement, accuracy, and gath- üblich oder unüblich das Objekt, das im Bild präsentiert wird,
ering information about SOT. The resulting LinguaPix data- für dich ist), (2) visuelle Komplexität (die Detailliertheit oder
base is the largest available tool of its kind and it is currently Kompliziertheit, die ein gegebenenes Bild darstellt), (3)
being extended to four more languages: Dutch, English, emotionale Wertigkeit (in welchem Maße ein gegebenes
Polish, and Cantonese. Since databases act primarily as re- Bild eine positive oder negative Emotion auslöst), (4)
sources, we see potential in applying information from Erregung (die Intensität oder Stärke eines emotionalen
LinguaPix in psycholinguistic research, cognitive psychology Zustandes, der mit einem gegebenen Bild verbunden wird).
research, computational linguistics, i.e. training image recog-
nition algorithms, or language learning and language impair-
ment research, i.e. adapting the photographs into a digital Acknowledgments We would like to extend our gratitude to the Fritz
Thyssen foundation that sponsored the research project. Furthermore, we
diagnostic tool for receptive vocabulary comprehension with would like to thank numerous student research assistants without whom the
children or aphasic patients. Finally, we would welcome ex- process of editing photos, collecting data, verifying accuracy of audio record-
tending the database to other languages which are currently ings, etc. would simply not have been possible. Thank you: Nora Kreyßig,
not under investigation. Annabel Mempel, Paula Schneider, Antonia Hahn, Franziska Cavar, Hanife
Ilen, Saveria Toscano, Svea Seidler, Konstantin Weber (our photographer),
Waldemar Schauermann (our programmer), and many other student research
assistants who have contributed to the project at different stages.
Appendix 1 - Instructions – picture-naming Funding Open Access funding enabled and organized by Projekt DEAL.
task Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing, adap-
In this experiment, you will be requested to name pictures tation, distribution and reproduction in any medium or format, as long as
presented on the screen. One picture will be shown on the you give appropriate credit to the original author(s) and the source, pro-
screen at a time. The items will change automatically once 5 vide a link to the Creative Commons licence, and indicate if changes were
made. The images or other third party material in this article are included
seconds have elapsed. There is no need for you to press any in the article's Creative Commons licence, unless indicated otherwise in a
buttons between the individual trials. credit line to the material. If material is not included in the article's
Please speak clearly. The microphone will automatically Creative Commons licence and your intended use is not permitted by
record all your answers in order to measure speech onset times statutory regulation or exceeds the permitted use, you will need to obtain
permission directly from the copyright holder. To view a copy of this
and name agreement. licence, visit http://creativecommons.org/licenses/by/4.0/.
Important:
& don’t use articles (e.g. the apple) but name the item itself
(e.g. apple)
References
& don’t use adjectives to describe the items (e.g. green ap- Adelman, J. S., Johnson, R. L., McCormick, S. F., McKague, M., Kinoshita,
ple) but (e.g. apple) S., Bowers, J. S., Perry, J. R., Lupker, S. J., Forster, K. I., Cortese, M. J.,
& don’t use hesitation devices (e.g. hmmm) Scaltritti, M., Aschenbrenner, A. J., Coane, J. H., White, L., Yap, M. J.,
Behav Res
Davis, C., Kim, J., & Davis, C. J. (2014). A behavioral database for Conference on Computer Vision and Pattern Recognition, (pp.
masked form priming. Behavior Research Methods, 46(4), 1052-1067. 248-255). IEEE. https://doi.org/10.1109/cvpr.2009.5206848
https://doi.org/10.3758/s13428-013-0442-y Duñabeitia, J. A., Crepaldi, D., Meyer, A. S., New, B., Pliatsikas, C.,
Alario, F. X., Ferrand, L., Laganaro, M., New, B., Frauenfelder, U. H., & Smolka, E., & Brysbaert, M. (2018). MultiPic: A standardized set
Segui, J. (2004). Predictors of picture naming speed. Behavior of 750 drawings with norms for six European languages. The
Research Methods, Instruments, & Computers, 36(1), 140-155. Quarterly Journal of Experimental Psychology, 71(4), 808-816.
https://doi.org/10.3758/bf03195559 https://doi.org/10.1080/17470218.2017.1310261
Altarriba, J., & Basnight-Brown, D. M. (2009). An overview of semantic Estes, Z., & Adelman, J. S. (2008). Automatic vigilance for negative
processing in bilinguals: Methods and findings. The Bilingual words is categorical and general. Emotion, 8(4), 453–457. https://
Mental Lexicon: Interdisciplinary Approaches, 79-99. https://doi. doi.org/10.1037/a0012887
org/10.21832/9781847691262-006 Ferrand, L., New, B., Brysbaert, M., Keuleers, E., Bonin, P., Méot, A.,
Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Augustinova, M., & Pallier, C. (2010). The French Lexicon Project:
Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, Lexical decision data for 38,840 French words and 38,840
R. (2007). The English lexicon project. Behavior Research pseudowords. Behavior Research Methods, 42(2), 488-496.
Methods, 39(3), 445-459. https://doi.org/10.3758/bf03193014 https://doi.org/10.3758/brm.42.2.488
Bates, E., D’Amico, S., Jacobsen, T., Székely, A., Andonova, E., Devescovi, Hutchison, K. A., Balota, D. A., Neely, J. H., Cortese, M. J., Cohen-Shikora,
A., Herron, D., Lu, C. C., Pechmann, T., Pléh, C., Wicha, N., E. R., Tse, C. S., Yap, M. J., Bengson, J. J., Niemeyer, D., & Buchanan,
Federmeier, K., Gerdjikova, I., Gutierrez, G., Hung, D., Hsu, J., Iyer, E. (2013). The semantic priming project. Behavior Research Methods,
G., Kohnert, K., Mehotcheva, T., ... Tzeng, O. (2003). Timed picture 45(4), 1099-1114. https://doi.org/10.3758/s13428-012-0304-z
naming in seven languages. Psychonomic Bulletin & Review, 10(2), Johnston, R. A., Dent, K., Humphreys, G. W., & Barry, C. (2010). British-
344-380. https://doi.org/10.3758/bf03196494 English norms and naming times for a set of 539 pictures: The role of
Binder, J. R., & Desai, R. H. (2011). The neurobiology of semantic age of acquisition. Behavior Research Methods, 42(2), 461-469.
memory. Trends in Cognitive Sciences, 15(11), 527-536. https:// Keuleers, E., & Balota, D. A. (2015). Megastudies, crowdsourcing, and
doi.org/10.1016/j.tics.2011.10.001 large datasets in psycholinguistics: An overview of recent develop-
Blackett, D. S., Harnish, S. M., Lundine, J. P., Zezinka, A., & Healya, E. ments. Quarterly Journal of Experimental Psychology, 68(8), 1457-
W. (2017). The Effect of Stimulus Valence on Lexical Retrieval in 1468. https://doi.org/10.1080/17470218.2015.1051065
Younger and Older Adults. Journal of Speech, Language, and Keuleers, E., & Marelli, M. (2020). Resources for mental lexicon re-
Hearing Research, 60(7), 2081–2089. search: A delicate ecosystem. In V. Pirrelli, I. Plag, & W. U.
Bonin, P., Méot, A., Laroche, B., Bugaiska, A., & Perret, C. (2019). The Dressler (Eds.), Word Knowledge and Word Usage (pp. 167–188).
impact of image characteristics on written naming in adults. Reading De Gruyter Mouton. https://doi.org/10.1515/9783110440577-005
and Writing, 32(1), 13-31.
Keuleers, E., Diependaele, K., & Brysbaert, M. (2010). Practice effects in
Bradley, M. M., & Lang, P. J. (1999). Affective norms for English words
large-scale visual word recognition studies: A lexical decision study
(ANEW): Instruction manual and affective ratings. (Technical re-
on 14,000 Dutch mono- and disyllabic words and nonwords.
port C-1). University of Florida, Center for Research in
Frontiers in Psychology, 1, Article 174, 1-15. https://doi.org/10.
Psychophysiology.
3389/fpsyg.2010.00174
Brodeur, M. B., Dionne-Dostie, E., Montreuil, T., & Lepage, M. (2010). The
Keuleers, E., Lacey, P., Rastle, K., & Brysbaert, M. (2012). The British
bank of standardized stimuli (BOSS), a new set of 480 normative photos
Lexicon Project: Lexical decision data for 28,730 monosyllabic and
of objects to be used as visual stimuli in cognitive research. PloS One,
disyllabic English words. Behavior Research Methods, 44(1), 287-
5(5), e10773. https://doi.org/10.1371/journal.pone.0010773
304. https://doi.org/10.3758/s13428-011-0118-4
Brodeur, M. B., Kehayia, E., Dion-Lessard, G., Chauret, M., Montreuil,
T., Dionne-Dostie, E., & Lepage, M. (2012). The bank of standard- Kousta, S.-T., Vinson, D. P., & Vigliocco, G. (2009). Emotion words,
ized stimuli (BOSS): comparison between French and English regardless of polarity, have a processing advantage over neutral
norms. Behavior Research Methods, 44(4), 961-970. https://doi. words. Cognition, 112(3), 473–481.
org/10.3758/s13428-011-0184-7 Kuperman, V., Estes, Z., Brysbaert, M., & Warriner, A. B. (2014).
Brodeur, M. B., Guérard, K., & Bouras, M. (2014). Bank of standardized Emotion and language: Valence and arousal affect word recognition.
stimuli (BOSS) phase II: 930 new normative photos. PloS One, 9(9), Journal of Experimental Psychology: General, 143(3), 1065–1081.
e106953. https://doi.org/10.1371/journal.pone.0106953 https://doi.org/10.1037/a0035669
Cabitza, F. (2015). Re: What are the implications of using even or odd Kurdi, B., Lozano, S., & Banaji, M. R. (2017). Introducing the open
Likert scales for a research survey? Retrieved on 20.05.2021 from: affective standardized image set (OASIS). Behavior Research
https://www.researchgate.net/post/What_are_the_implications_of_ Methods, 49(2), 457-470. https://doi.org/10.3758/s13428-016-
using_even_or_odd_Likert_scales_for_a_research_survey/ 0715-3
55b7a671614325f38f8b457a/citation/download. Accessed 1 Lang, P. J., Bradley, M. M., & Cuthbert, B. N. (1997). International affective
June 2021 picture system (IAPS): Technical manual and affective ratings. NIMH
Cop, U., Dirix, N., Drieghe, D., & Duyck, W. (2017). Presenting GECO: Center for the Study of Emotion and Attention, 1, 39-58
An eye-tracking corpus of monolingual and bilingual sentence read- Larsen, R. J., Mercer, K. A., Balota, D. A., & Strube, M. J. (2008). Not all
ing. Behavior Research Methods, 49(2), 602-615. https://doi.org/10. negative words slow down lexical decision and naming speed:
3758/s13428-016-0734-0 Importance of word arousal. Emotion, 8(4), 445–452. https://doi.
Dan-Glauser, E. S., & Scherer, K. R. (2011). The Geneva affective pic- org/10.1037/1528-3542.8.4.445
ture database (GAPED): a new 730-picture database focusing on Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár,
valence and normative significance. Behavior Research Methods, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in
43(2), 468. context. In: European Conference on Computer Vision, (pp. 740-755).
De Houwer, J., & Hermans, D. (1994). Differences in the affective pro- Springer. https://doi.org/10.1007/978-3-319-10602-1_48
cessing of words and pictures. Cognition & Emotion, 8(1), 1–20. McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some
https://doi.org/10.1080/02699939408408925 intraclass correlation coefficients. Psychological Methods, 1(1), 30-46.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Moreno-Martínez, F. J., & Montoro, P. R. (2012). An ecological alterna-
ImageNet: A large-scale hierarchical image database. In: IEEE tive to Snodgrass & Vanderwart: 360 high quality colour images
Behav Res
with norms for seven psycholinguistic variables. PloS One, 7(5), Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in
e37527. https://doi.org/10.1371/journal.pone.0037527 assessing rater reliability. Psychological Bulletin, 86(2), 420-428.
Perret, C., & Bonin, P. (2019). Which variables should be controlled for Snodgrass, J. G., & Vanderwart, M. (1980). A standardized set of 260
to investigate picture naming in adults? A Bayesian meta-analysis. pictures: norms for name agreement, image agreement, familiarity,
Behavior Research Methods, 51(6), 2533-2545. and visual complexity. Journal of Experimental Psychology:
Pratto, F., & John, O. P. (1991). Automatic vigilance: The attention- Human Learning and Memory, 6(2), 174-215. https://doi.org/10.
grabbing power of negative social information. Journal of 1037/0278-7393.6.2.174
Personality and Social Psychology, 61(3), 380–391. https://doi. Warriner, A. B., Kuperman, V., & Brysbaert, M. (2013). Norms of va-
org/10.1037/0022-3514.61.3.380 lence, arousal, and dominance for 13,915 English lemmas. Behavior
Rossion, B., & Pourtois, G. (2004). Revisiting Snodgrass and Research Methods, 45(4), 1191–1207. https://doi.org/10.3758/
Vanderwart's object pictorial set: The role of surface detail in s13428-012-0314-x
basic-level object recognition. Perception, 33(2), 217-236. https:// White, K. K., Abrams, L., LaBat, L. R., & Rhynes, A. M. (2016).
doi.org/10.1068/p5117 Competing influences of emotion and phonology during picture-
Roux, F., Armstrong, B. C., & Carreiras, M. (2017). Chronset: An auto- word interference. Language, Cognition and Neuroscience, 31(2),
mated tool for detecting speech onset. Behavior Research Methods, 265–283. https://doi.org/10.1080/23273798.2015.1101144
49(5), 1864-1881. https://doi.org/10.3758/s13428-016-0830-1 Yap, M. J., Liow, S. J. R., Jalil, S. B., & Faizal, S. S. B. (2010). The
Russell, J. A. (2003). Core affect and the psychological construction of Malay Lexicon Project: A database of lexical statistics for 9,592
emotion. Psychological Review, 110(1), 145–172. words. Behavior Research Methods, 42(4), 992-1003. https://doi.
Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-Prime: User's org/10.3758/brm.42.4.992
Guide. Psychology Software Tools Incorporated.
Seidenberg, M. S., & Waters, G. S. (1989). Reading words aloud-a mega Publisher’s note Springer Nature remains neutral with regard to jurisdic-
study. Bulletin of the Psychonomic Society, 27. 489. tional claims in published maps and institutional affiliations.