The Sound of Color
The Sound of Color
The Sound of Color
This study investigated the integration of auditory and visual sensory information. Seventy-one participants were presented sine wave tones along with seven on-screen colored boxes. Participants chose which color fit best with each presented tone. Color choices from ten exposures to each tone across 80 trials indicated an inverse audio-visual sensory processing relationship between wavelength and frequency of light versus wavelength and frequency of sound. Analyses suggest a consistent and symmetrical data pattern revealing a quasi-linear relationship between pitch and color that suggests a natural, stable, and universal auditory/visuo-sensory neurological processing algorithm for simple tones when presented with basic colors.
metaphorically as greater brightness; in turn, brightness expressed itself as greater loudness and as higher pitch. Given the lack of empirical data investigating the color-sound integration, this study examined the best fit influences of eight sine wave tones (32.7, 65.4, 130.81, 261.63, 523.25, 1046.5, 2093.0, and 4186.0 Hz [C1 through C8]) on color choice (colored boxes of black, red, green, yellow, orange, violet, and blue).
METHOD Participants
Seventy-one graduates and undergraduates (59 females and 12 males) from a large Midwestern university participated for partial course credit.
Integrated processing of visual and auditory stimuli has been a fascination to both scientists and artists for many centuries. Sir Isaac Newtons (1704) attempt at tonal correspondence with colors (i.e., the visual spectrum ROYGBIV with each octave of the key of C) set off a wave of associative inquiry that has continued for 300 years. Synesthetic-like interactions of crossmodal interactions have since been investigated by several researchers (e.g., Marks 1974, 1987; Hubbard, 1996; Datteri, 1998). Marks (1974) asked participants to match the brightness of gray surfaces with pure tones and found an increase in loudness to be associated with an increase in brightnesswith some participants associating it to an increase in darkness. Marks indicated these results to be a synesthesia-like effect, suggesting that most participants will match auditory brightness to visual brightness. Furthering this line of research, Hubbard (1996) investigated this synesthetic-like effect by addressing visual lightness, auditory pitch, and melodic interval. Participants were asked to indicate how visual lightness and auditory pitches fit together. Findings suggested that lighter visual stimuli fit better with higher pitches, and darker visual stimuli fit better with lower pitches. Additionally, Hubbards investigation found that larger melodic intervals produced more extreme (lighter or darker) choices by participants. In a related study, Datteri (1998) investigated synesthetic-like interactions between a basic black and white figure-ground visual stimulus and high and low tones. Participants rated the white area as standing out when high tones were played, and rated the black area as standing out when low tones were played. In another study, Marks (1982) used synesthetic metaphors (e.g., the dawn comes up like thunder) and found that language metaphors are able to influence perception of the music/sound color connection. Across a series of four experiments, participants utilized loudness, pitch, and brightness scales in evaluating the meanings of a variety of synesthetic auditory-visual metaphors. Results indicated loudness and pitch expressed themselves
Auditory stimuli. Audio stimuli consisted of non-harmonic content sine wave files (i.e., C1, C2, C3, C4, C5, C6, C7 and C8; 32.7 Hz, 65.41 Hz, 130.81 Hz, 261.63 Hz, 523.25 Hz, 1046.5 Hz, 2093.0 Hz, and 4186.01 Hz, respectively) and were created as .wav files and subsequently converted to .wma files to decrease file size and possibly eliminate any potential latency in the autostart loading process during stimulus presentation. Audio samples were not constructed for equivalent amplitude presentation and the natural sound energy/amplitude of each individual tone was present. All sine tones were presented ten times each (thus, 80 tones were presented) and were audible for three seconds. Visual stimuli. The visual stimuli consisted of seven colored boxes (Figure 1) with on-screen visual dimensions of 30mm wide x 28mm high (90 x 80 pixels) with the entire row of boxes 96 mm from the top of the interface form. The first box (black) in the row of colored boxes was located 58 mm from the left side of the interface form and there was five millimeters located between each of the boxes. The boxes were presented across all trials in a static sequence (that was randomly determined) with presentation order from left to right being black, red, green, yellow, orange, purple, and blue. The colored boxes were not neutralized for presentation of equal perceived brightness (albedo) thus allowing for those perceived as seemingly brighter, e.g., yellow, to be perceived as such. This coincides with the sound energies of the different waveform
stimuli, which as sine waves were presented in their natural frequency and amplitude forms.
Participants were seated at a computer workstation and were instructed as to the location of the headphones they would use and the location of the volume control for their headphones. Participants placed the headphones on and read the instructions and directions as presented by the software program. Participants were given five practice trials drawn from the experimental trials and then began the experimental trials if they had no further questions. Participants within a trial would hear a tone, and would then be asked to indicate their colored box choice by clicking on the box directly, which would then immediately place the name of this color choice into a small text box on the screen. Participants would then click a continue button to report their response choice, which would then render the text box blank for the next trial. After clicking the continue button, the tone for the next trial would play. Additionally, there was a brief three second delay built into the front of each tone so that each tone would play three seconds after clicking the continue button. This was designed to prevent participants from proceeding too hurriedly through the trials, as well as spacing out the tone presentations. After completing all 80 experimental trials, participants filled out a brief demographic survey and were debriefed.
Hz tone, t (70) = .279, p = .781, and the 4186.01 Hz tone, t (70) = 1.024, p = .309, did not differ significantly from the yellow test value, indicating that these tones were statistically representative of the color yellow on participant choices (Figure 1). It is also interesting to note that the frequency located between these two tones, 2093.0 Hz, was in fact significantly different from the yellow test value, t (70) = 2.868, p = .005, which would seem to indicate a strong preference on the part of participants to dissociate the 2093.0 Hz tone from the color yellowor that this frequency of tone is perhaps encompassed by the two other tone frequencies and cannot be discriminated well in comparison to them. For the second of these three main one-sample tests, a comparison of the means of all tone response scores against a test value of 4, representing green, t (70) = .830, p = .410, indicates that the tone 261.63 Hz (C4) was the only tone that did not differ significantly from this test value representing the choice of the green box (Table 7 and Figure 2). All other tones within this one sample test did differ significantly from the test value of 4, thus leaving green as the only color across all one-sample tests conducted that was shown to be specifically associated with only one of the eight presented tones. This association of green with 261.63 Hz appears to be a very strong one in that the closest test statistic value to the t (70) = .830 generated for 261.63 Hz, was that of 130.81 Hz, t (70) = 5.241, p < .0001. The third of these three critical one-sample t-tests compared the respective means for all the tones against a test value of 5, which represented blue (Table 8). The results here indicated that the mean for all 32.7 Hz tone trials, t (70) = 1.043, p = .301, was not significantly different from a test value of 5 (Figure 3). Additionally, the mean for all 130.81 Hz tones, t (70) = -1.357, p =.179, was also not significantly different from a test value of 5 representing the blue box (Table 8 and Figure 5). It is interesting to note that once again the frequency located between these two tones, 65.41 Hz, was significantly different from the blue test value, t (70) = 4.182, p = .000, which again may indicate a strong preference on the part of participants to dissociate 65.41 Hz from the color blue, or again perhaps this frequency is absorbed by these surrounding two tone frequencies and cannot be separately discerned. Another of the interesting findings of this experimental analysis is the overall highly symmetrical pattern that emerged in the dataset regarding the values representing level of significance for each tone on each one-sample test (the highlighted values in Tables 6, 7, and 8). It is necessary to present these one-sample tables in close proximity and in a sequential order so that the highlighted values and the symmetry can be appreciated. This pattern indicates a highly symmetrical relationship that is well-defined and perhaps linear in its forma result that will be addressed momentarily with linear regression analysis. The one-sample test comparison tables can be seen here in sequential order with respect to the various one-sample test values utilized. Correlations of all variables were compiled and can be seen in Table 9. One can clearly see a definitive cut-off line at the approximate area between 261.63 Hz (C4) and 523.25 Hz where the valences of the correlation values not only change abruptly, but do so symmetrically and inversely. To further assess any potential linear relationship between tones and colors, linear regression analyses were conducted. A
The analyses proceeded via a series of one sample t-tests that were conducted amongst the means garnered by all eight tones across the experimental trials (Table 1). All seven of the colored boxes were coded with numbers from one to seven so as to generate a mean rating for each colored box choice. For the first one sample t-test, all tones were tested against a test value of 1, which represented red (Table 2; contact authors for remaining tables not included in proceedings). All test tone responses across all trials for each respective tone differed significantly from the test value of 1 (p < .0001). The largest of these significant differences was for 65.41 Hz, t (0, 70) = 40.614, p < .0001, and the smallest of these significant differences was for 4186.0 Hz, t (0, 70) = 14.616, p < .0001. It can be clearly seen within Table 2 that none of the means for the individual tones even closely approximated a value of 1, evidence suggesting the absence of even marginal consideration for red as being representative of any of these tone frequencies regarding participant choices. Significance of the results when using test values of 2, 6, and 7 representing the colors orange (Table 3) , black (Table 4), and violet (Table 5), also indicated that the means of all responses across all groups of tones were significantly different from these test values. Again this indicates that at a high level of significance, none of the participant choices of orange, black, or violet were related to any of the presented tones. The most interesting of these one sample test comparisons were the three tests conducted for the coded values representing yellow, green, and blue (Tables 6, 7, and 8). The first of these comparisons was conducted against a test value of 3 which represented yellow (Table 6). The result of this comparison showed that the 1046.5
dependent variable representative of high tone presentation choices was constructed from all participant color choices on trials where 523.25, 1046.5, 2093.0, and 4186.0 Hz tones played, and consisted of a compiled mean representing color choices coinciding with these tone presentations. Similarly, a dependent variable representative of low tone presentation choices was also constructed from participant color choices on all trials where 32.7, 65.41, 130.81, and 261.63 Hz tones played, and likewise consisted of a compiled mean representing color choices coinciding with these tone presentations. The first of the regression analyses attempted to predict potential color choice on high tone presentations from actual low tone response color choices, and included low tone predictor variables 32.7, 65.41, 130.81, and 261.63 Hz as part of the low tone predictor variable group. The results of this regression, F (4, 66) = 12.336, p < .0001, R = .428, indicate substantial significance and do presume a linear model. However, confidence intervals for beta coefficients for tone 65.41 Hz within the model indicated an interval containing the value of zero. Thus it was determined that a regression model excluding the 65.41 Hz tone would perhaps account for nearly the same amount of variability. This second regression analysis, F (3, 67) = 16.66, R = .427, p < .0001, did in fact account for an identical amount of variability within the modeland did so by using one less predictor variable, thus increasing parsimony and preserving confidence interval integrity. The results for this regression can be seen here in Figure 4 as represented by standardized residuals using a plot of expected vs. observed values. Conversely, the ability of color choice on high tones to predict color choice on low tone presentations was also analyzed. Within a linear regression analysis using high tone predictor variables 523.25, 1046.5, 2093.0, and 4186.0 Hz as part of the high tone predictor variable group, F (4, 66) = 12.586, R = .433, p < .0001, indicates a linear relationship thus further supporting the first regression analysis. Upon examination of the confidence intervals for the regression coefficients, it was noted that the 523, 1046, and 2093 predictor variables had coefficient confidence intervals inclusive of zero, thus warranting another analysis to determine existence of a more explanatory model. Through further analysis, it was determined that the elimination of the 523 and 2093 predictor variables allowed arrival at a best regression model, F (2, 68) = 22.595, R = .399, p < .0001, which via only two predictor variables, accounted for an approximately equal amount of variability by the model. More importantly, increased parsimony and confidence interval of coefficient values noninclusive of zero were attained. The results of this regression as represented by a standardized residual plot can be seen in Figure 5. The final regression analysis utilized an SPSS curve estimation procedure that via several different regression models (e.g., linear, quadratic, cubic, logarithmic) analyzed the compiled dependent measures of 1) color choice on low tone presentations; and 2) color choice on high tone presentations, and their potential in predicting one another. Both of these variables were tested as a dependent measure and as a predictor variable in an attempt to uncover as to which variable, within the role of predictor, would account for the most variability, and which model would provide the best fit for the data. The results of this regression analysis were strongest via a quadratic model (Figure 6), F (2, 68) = 33.712, R = .497, p <
.0001, indicating that nearly half of the variability was accounted for by this model. Additionally, confidence intervals of B coefficient values were devoid of zero and thus supported the significance level as well. Regression model results for all analyses were scrutinized, and any model was subsequently eliminated if failing to account for sufficient variability, or if the presence of a coefficient confidence interval included a value of zero.
The highly symmetrical dataset relationship as derived from these analyses is extremely unusual as represented by the consistency at which valences of the correlation values (Table 9) reverse at exactly the same points as represented by the borderline tones 261.63 Hz and 523.25 Hz. The correlation table clearly reveals the presence of a dividing line that seemingly establishes the area separating high tone/low tone and yellow/green/blue differentiation. Perhaps more striking yet is the theorized relationship of light wavelength to auditory stimuli as per the results of this experiment. If one looks at Figure 7 one can view a graphic that approximates the potential inverse relationship between light wavelength frequency and sound wavelength frequency, as directly construed by these experimental results and analyses. The interest in a relationship such as that alluded to in Figure 9 has been romanticized for centuries if not thousands of years. As mentioned at the outset of this paper, Sir Isaac Newton promulgated the potential of this relationship in such a way that it closely resembles the results of these analysesalbeit preceding them by a mere 300 years. The results of this experiment and analysis however, put a slightly different turn on Newtons color-wheel. As one can see in Figure 10, the necessary adjustments have been made to reflect the outcome of this study, so as to compare Newtons theoretical color-wheel with an empirically derived one. This experiment presents indications that there is a consistent pattern of responding regarding association of color with tone frequencies. This experiment provides great insight in how this relationship of tones and colors may be founded, but it cannot provide full support for the definitive relationship of tones and colors. Further research should look into other tone sequences to determine exactly where the borderlines of this tone/color relationship effect truly lie (Datteri & Howard, 2004, in progress). For example, a set of tones that do not reflect natural or obvious tones that one would be exposed to, would be of great interest. Perhaps the human ear is naturally tuned to the tones that are indeed reflected on the piano keyboard, and that the historical development of the piano keyboard and its tones was unconsciously guided by this natural response attunement. It appears that this series of experiments may be a strong movement in the direction of establishing this natural relationship. In the case of this experiment, one must carefully consider the appearance of this type of data set as the potential revealing of a natural, stable, and universal phenomena regarding auditory and visuo-sensory neurological processing within a fundamental crossmodality stimulus presentation environment. In fact, Figure 9 is an
intended representation of this potential universal sensory processing relationship in its basic form. Additionally, and as important as any pure psychophysical result derived from this experimental sequence, this investigation appears to reveal additional insight into the apparent relationship between the visible light frequency spectrum and the sound frequency spectruma finding no doubt of interest to the field of physics. In continuance of this line of investigation, other cross-sensory modality combinations are currently being addressed within this same vein (Howard, 2004, in progress) that will build upon uncovering additional unique multi-sensory processing responses that may exist within human cross-modality neuro-sensory mechanisms. Cross-sensory modality processing and its influence on human behavior via perceptive interpretation, is an extremely valuable research endeavor in a future of computerized virtual interaction between machine and man. The importance and urgency of understanding this relationship is underscored by the importance of the technology itself, and its value to science across all endeavors of human exploration.
Figure 3. Direct mapping of Table 6 results to piano keyboard showing C1 (32.7 Hz) and C3 (130.81 Hz) as associated with Blue.
Figure 4. Plot of expected vs. observed cumulative probabilities of regression standardized residuals for dependent measure color choice on HIGH frequency tone presentation trials using predictor variables 32.7, 130.81, and 261.63 Hz.
Dependent Variable:
color choice on high tone presentations
Figure 5. Plot of expected vs. observed cumulative probabilities of regression standardized residuals for dependent measure color choice on LOW frequency tone presentation trials using predictor variables 1046.0 and 4186.0 Hz.
Dependent Variable:
color choice on low tone presentations
Table 1. Means of responses for the 8 sine wave tones Red = 1, Orange = 2, Yellow = 3, Green = 4, Blue = 5, Black = 6, Violet = 7 Figure 1. Direct mapping of Table 4 results to piano keyboard showing C6 (1046.5 Hz) and C8 (4186.0 Hz) as associated with Yellow.
Figure 2. Direct mapping of Table 5 results to piano keyboard showing C4 (261.63 Hz) as associated with Green.
Figure 6. Quadratic model showing color choice on LOW tone presentations of 32.7, 65.4, 130.81, and 261.63 Hz, as predicted by color choice on HIGH tone presentations of 523.25, 1046.5, 2093.0, and 4186.0 Hz.
1. Datteri, D. L. (1998). Influences of concurrent auditory frequency on the perception of an ambiguous visual stimulus. Unpublished masters thesis. Texas Christian University 2. Hubbard, T.L. (1996). Synesthesia-like mappings of lightness, pitch, and melodic interval. American Journal of Psychology, 109, (2), 219-238. 3. Marks, L.E. (1982). Bright sneezes and dark coughs, loud sunlight and soft moonlight. Journal of Experimental Psychology: Human Perception and Performance, 8(2), 177193. 4. Marks, L.E. (1974). On associations of light and sound: The mediation of brightness, pitch, and loudness. American Journal of Psychology, 87 (1-2), 173-188. 5. Marks, L.E. (1987). On cross-modality similarity: Auditoryvisual interactions in speeded discrimination. Journal of Experimental Psychology: Human Perception and Performance, 13 (3), 384-394. 6. Newton, I. (1704). Opticksor a treatise of the reflections, refractions, inflections & colours of light. (4th Ed., London) Dover Publications.
Figure 7. Theorized inverse relationship between wavelength frequency of light and wavelength frequency of sound for single tone presentations