Perceived Roughness - A Recent Psycho-Acoustic Measurement
Perceived Roughness - A Recent Psycho-Acoustic Measurement
Perceived Roughness - A Recent Psycho-Acoustic Measurement
Convention Paper
Presented at the 126th Convention
2009 May 7–10 Munich, Germany
The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have
been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from
the author’s advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes
no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio
Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights
reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the
Journal of the Audio Engineering Society.
Perceived Roughness -
a Recent Psycho-Acoustic Measurement
Robert Mores1 , Thorsten Smit1 and Jana-Marie Wiese1
1
University of Applied Science, Hamburg, Germany
ABSTRACT
This paper relates to an investigation on perceived roughness from Aures in 1984 where findings are based on
psycho-acoustic tests with synthetic sounds and a small group of people. The related results have repeatedly
been used for modelling roughness perception since then, for instance in the context of noise perception.
Roughness is again an issue when investigating the perceived quality or timbre of musical sounds. In this
context, roughness is one among some ten mid-level features to be extracted. Here, perceived roughness
is measured again, but on a wider basis than in the earlier investigation. This paper outlines the psycho-
acoustic investigation, basically following the method of Aures, but modifying some of the issues under
question. The results are reasonable and differ from the earlier findings in various aspects.
cations. It also describes the true test conditions for carrier frequency of 1 kHz are specifically important.
presented results. Here this circumstance is appreciated in particular
by defining an anchor parameter set. As this set is
2. METHOD USED AND ALTERATIONS important for linking the results of the two phases,
In principle the method employed here follows the the group size has been enlarged to 50 individuals.
basic idea already used by Aures [2]: in a first The larger group on this parameter set also helps to
test phase, two tones of same carrier frequency estimate confidence levels for the other parameter
are presented, and individuals have to adjust sets with only 20 participants.
the modulation intensity for specific modulation
frequencies until the perceived roughness matches Apart from these modifications, there are a
that of a reference sound of different modulation few other issues where the Aures publication does
frequency. This first test phase therefore delivers not fully outline details of interest to the reader or
sensitivity of perceived roughness against variations of relevance to the procedure:
of modulation frequency. In a second phase, rough-
ness across different carrier frequencies is compared a) The Aures investigation says nothing about
within sound pairs, maintaining the modulation the eight individuals involved in the test. In
frequency for both sounds, and, again, adjusting the this investigation, the 50 individuals are ran-
modulation intensity until the perceived roughness domly chosen people, mainly from the faculty
matches. This second phase delivers sensitivity of environment. None of the individuals has been
perceived roughness against variations of carrier involved in the research questions behind the test.
frequency. Finally, results from the two phases are None of the individuals has specifically been trained.
roughly brought into a common context, and the
perceived roughness is mapped against the three b) Individuals have been interviewed on musi-
parameters used. This investigation uses the same cal skills or training after execution of the test.
sequence of test phases and the same final context Therefore, individuals have not been biased prior to
map. It also employs the same test procedure, using the test, nor have they been under pressure in terms
the method of adjustment on sound pairs. Most of expected performance. The subsequent question-
of the parameters of the synthetic sounds are also naire allows for classification across individuals and
identical. for meaningful evaluation of data, e.g. perception
of musicians vs. non-musicians.
The few modifications of our test approach
are fully conform with the general recommendations c) The sound pairs are randomly permutated.
for psychophysical tests from Hellbrück et al. [5]. However, the permutation has been manually
re-edited to maximize options for cross checking
a) The number of sound pairs to be adjusted the obtained raw data between groups and between
by individuals is 18 for the first test phase and 12 parameter sets.
for the second test phase, compared to 124 and
112 sound pairs in the respective Aures test. The d) The authors expected that individuals would
authors felt more confident when requesting only need the first few sound pairs for adapting to the
some 20 minutes of attention from individuals, test environment and for finding some confidence
rather than hours. when adjusting the modulation intensity. Therefore
three sounds were presented to individuals prior to
b) The group size is more than doubled com- the test phase: a slightly vibrating tone, a rough
pared to the former test, 20 instead of eight tone, and a rather rough tone.
individuals. This larger group size should deliver an
improved statistical basis. e) In addition, the uncertainty of the early
test phase is addressed by allocating a larger group
c) According to Aures’ method for the final size for the first few sound pairs. The permutation
context mapping, results for sound pairs with a has been adjusted in a way that parameter sets
with a larger group size are likely to be presented fc in Hz 125 250 500 1000 2000 4000
in the early test phase. fm in Hz 50 50 50 70 70 70
3.1. Parameters Sets, Group Size and Allocation (TG). Test groups one to five are allocated across the
The sounds consist of a carrier with carrier frequency different carrier frequencies:
fc , AM modulated with modulation frequency fm
and modulation intensity m. Sounds are synthesized index fc a b c d e f
according to TG1 x x x
TG2 x x x
ym = sin(2πfm · t) (1)
TG3 x x x
yc = sin(2πfc · t) (2) TG4 x x x
TG5 x x x
yAM = (1 + m · ym ) · yc . (3)
Table 4: Allocation of test groups (TG) across car-
Sampling rate is always fs = 44.1 kHz and the sig- rier frequencies fc (see table 1)
nal duration is tsig = 1 s.
This test uses the same set of carrier frequencies as
Aures did for his investigation: According to this allocation, the carrier frequency 1
kHz is presented to 50 individuals, whereas all other
carriers are presented to only 20 individuals.
index a b c d e f 3.2. Permutation
fc in Hz 125 250 500 1000 2000 4000 Table 5 lists the permutation map for test group
two. Letter and number for each entry correspond
Table 1: set of carrier frequencies used
to the indices of carrier and modulation frequency
according to Tables 1 and 2. For example, the
sound c1 represents a 500 Hz carrier which is
The set of modulation frequencies is reduced com-
modulated with 40 Hz. In the first test phase
pared with the Aures approach in order to reduce
the respective reference sound is a signal with the
the number of parameters and to enhance the test
same carrier frequency. Specifically, the carrier is
quality for the parameters of interest:
again fc = 500 Hz, modulated with fm = 50 Hz,
according to Table 3. Sounds under investigation
index 1 2 3 4 5 6 are always modulated with m = 0.7, whereas the
fm in Hz 40 55 65 75 90 120 modulation intensity of the reference sound is to
be adjusted. Sound pairs number four to 18 are
Table 2: set of modulation frequencies used randomly permutated with some manual correction
to avoid long sequences of always the same carrier
frequencies. The early sound pairs number one to
Sound pairs always consist of a sound under inves- three employ two distinct sequences to allow for
tigation and a corresponding reference sound, pre- cross checking raw data within and between test
defined according to table 3. The modulation fre- groups.
quency fm is chosen roughly in the area of an ex-
pected maximum for the perceived roughness. In test phase two the reference sound always
consists of a 1 kHz carrier using the same modu-
50 individuals have been organized in five test groups lation frequency as the sound under investigation
VP 2.1 VP 2.2 VP 2.3 VP 2.4 VP 2.5 VP 2.6 VP 2.7 VP 2.8 VP 2.9 VP 2.10
1 d1 d1 d1 d1 d1 d6 d6 d6 d6 d6
2 d5 d5 d5 d5 d5 d3 d3 d3 d3 d3
3 d2 d2 d2 d2 d2 d4 d4 d4 d4 d4
4 e5 b2 b1 e6 e6 b5 b4 e2 b1 e3
5 e4 d6 b3 e1 b5 e5 b5 e6 b4 e2
6 b1 b6 e5 b5 b6 e6 d5 b6 b2 b1
7 b4 b5 b2 b1 b3 b6 e5 d1 d2 b2
8 e6 e4 d3 d6 e5 b4 e3 d2 e2 b3
9 e3 b3 d4 b4 e1 e3 b1 e1 e3 e4
10 e1 d4 e4 e3 e4 d1 b3 e4 d1 d5
11 d3 d3 e2 e2 d3 d2 b2 b2 d5 d2
12 d4 e1 e3 e4 d6 d5 e4 b3 e4 d1
13 b3 e3 b4 d4 e3 e4 e1 b1 e1 e1
14 e2 e6 d6 d3 e2 b3 d2 e3 e5 b4
15 b5 b4 b6 b2 d4 b2 d1 e5 b3 b6
16 b6 b1 b5 e5 b2 b1 b6 d5 b6 e6
17 d6 e2 e1 b3 b4 e2 e6 b5 b5 e5
18 b2 e5 e6 b6 b1 e1 e2 b4 e6 b5
and a modulation intensity m = 0.7. This allows quence to lean on earlier decisions. Individuals were
evaluation of perceived roughness across different always aware of their work progress.
carrier frequencies.
Before the test, individuals were allowed to set their
favoured loudness level for comfortable listening, but
3.3. Test Environment and Conditions they were also instructed to maintain the loudness
Individuals executed the test alone in an acoustically level over the test. Three learning sounds were pre-
dry and silent room in the morning hours. They were sented to individuals prior to the test: (i ) a slightly
given enough time to comfortably do the test. Short vibrating tone, fc = 1 kHz, fm = 10 Hz, m = 0.4,
steady-state sounds of one second duration were au- (ii ) a rough tone d1 with m = 0.6 and (iii ) a rather
tomatically generated on a computer and binaurally rough tone d4 with m = 0.9. Prior to the test,
presented via external sound board (U A − 25) and a individuals were also introduced to the use of the
headset (HD 202). Individuals had the free choice very simple GUI. They were encouraged to settle all
to repeatedly listen to the sounds and to jump be- questions before they were left alone with the test.
tween the reference sound and the sound under in- Questions on the purpose of the test or the related
vestigation in order to adjust the level of modula- research questions have not been answered. Individ-
tion intensity m. Control and adjustment were done uals were encouraged to take a break between the
via a MATLAB-based graphical user interface and two test phases.
mouse. The control bar for the modulation intensity
allowed for adjusting m continuously between zero - 3.4. Test Log Book
no modulation at all - and one - maximum depth. The mean working duration was 13.4 min with a
Therefore, the technically limited parameter space deviation of 4.0 min for the first test phase (18 sound
was much wider, than the space needed for the task. pairs), and was 12.9 min with a 3.9 min deviation
Each sound pair was presented on a separate page for the second test phase (12 sound pairs) after an
and adjustment results were captured from individ- average break of 10 min. From 50 individuals nine
uals with each step through the sequence of pages. persons set the loudness to a slightly lower level and
There was no way back in reverse direction of the se- one person to a slightly higher level.
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
fm fm fm
Fig. 1: Results from the first test phase - comparison of different modulation frequencies for individual
carriers - graphs represent modulation intensity versus modulation frequency
fc = 125 Hz fc = 250 Hz fc = 500 Hz
1 1 1
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
fm fm fm
Fig. 2: Results from the second test phase - comparison of different carriers for individual modulation
frequencies - graphs represent modulation intensity versus modulation frequency
1.2
fc = 1 kHz
0.8
fc = 125 Hz fc = 2 kHz
fc = 4 kHz
m̃ 0.6
fc = 250 Hz
0.4
fc = 500 Hz
0.2
0
20 40 60 80 100 120 140
fm
Fig. 3: Sensitivity of perceived roughness against carrier frequency and modulation frequency
R = mα , where α ranges from roughly 1 to 2 in the spent. The classification was done after the test,
literature, and is not necessarily considered as con- there was no pre-selection or call for musicians. Par-
stant over carrier frequency. Therefore the abscissa ticipants without any skills on musical instruments
represents a qualitative measure rather than a quan- are classified as non-musicians, whereas all partici-
titative measure. Reading the figure appropriately, pants with training on at least one musical instru-
the scaling still reflects the method from the test, ment are classified as musicians. By pure chance, the
where parameter sets match for equally perceived two groups have almost equal size: 23 non-musicians
roughness. Absolute measures for the perceived and 26 musicians.
roughness would require further investigations.
The main result is that humans are most sensitive in Figures 4 and 5 show the results for the group of mu-
the 1 kHz region. Perceived roughness declines with sicians and non-musicians, respectively. There are
higher carrier frequency. It declines for lower carrier only few minor differences in the results. Quartiles
frequencies and even stronger when the modulation are slightly smaller for many entries in the group
frequency rises. For the 1 kHz carrier the maximum of musicians. There are minor differences for the
roughness is confirmed at 70 Hz modulation. For 125 Hz and 250 Hz carriers at low modulation fre-
lower carrier frequencies, the maximum roughness is quencies. However, this seems to be an area of gen-
perceived at much lower modulation frequencies. eral uncertainty. It is the same parameter range
where the results here differ from Aures’ results.
4.3. Results across Classes Apart from this area, the difference between me-
Perceived roughness seems to be independent from dians from the two groups is always significantly
the fact, whether an individual has been actively smaller than the uncertainty of the entire approach,
exercising a musical instrument or not. For this expressed in the quartiles. Therefore it seems, that
comparison, the participants have been interviewed there are no large differences in human perception
about their musical practice, instruments and time of roughness, whether musician or not.
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
fm fm fm
Fig. 4: Results from the first test phase - comparison of different modulation frequencies for individual
carriers - non-musicians
fc = 125 Hz fc = 250 Hz fc = 500 Hz
1 1 1
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
0 0 0
20 60 100 140 20 60 100 140 20 60 100 140
fm fm fm
Fig. 5: Results from the first test phase - comparison of different modulation frequencies for individual
carriers - musicians
Test person Musical Active years Time spent Listening to Age Hearing
instrument hours/week classical deficiency
TP1.1 - - - y 28 n
TP1.2 guitar 17 abandoned n 31 n
bass guitar 17 abandoned
TP1.3 guitar 18 14 n 32 y1
bass guitar 18 14
TP1.4 - - - n 26 n
TP1.5 Trombone 8 abandoned n 27 n
TP1.6 saxophone 12 3-4 n 27 n
guitar 4 2
piano 2 abandoned
TP1.7 drums 6 abandoned n 27 y2
piano 3 abandoned
TP1.8 - - - n 26 n
TP1.9 - - - y 26 n
TP1.10 - - - n 35 n
Test person Musical Active years Time spent Listening to Age Hearing
instrument hours/week classical deficiency
Test person Musical Active years Time spent Listening to Age Hearing
instrument hours/week classical deficiency
Test person Musical Active years Time spent Listening to Age Hearing
instrument hours/week classical deficiency
TP4.1 - - - n 27 n
TP4.2 guitar 15 0.1 y 26 n
piano 2 0.25
vocals 1 abandoned
TP4.3 guitar 25 abandoned y 50 n
vocals 30 2
TP4.4 - - - n 30 n
TP4.5 - - - n 27 y6
TP4.6 - - - n 33 n
TP4.7 piano 20 abandoned y 27 n
saxophone 10 3.5
vocals 15 1
TP4.8 vocals 2.5 3 n 25 y7
violine 8 abandoned
TP4.9 - - - n 28 n
TP4.10 guitar 13 2 n 29 n
Test person Musical Active years Time spent Listening to Age Hearing
instrument hours/week classical deficiency
TP5.1 - - - n 24 n
TP5.2 drums 16 1 n 27 n
TP5.3 organ 8 abandoned n 31 n
TP5.4 - - - n 28 n
TP5.5 Cello 24 0.015 n 30 n
piano 12 5
TP5.6 vocals 20 0.5 y 33 n
TP5.7 - - - n 27 n
TP5.8 piano 28 8 y 25 y8
vocals 24 5
guitar 20 5
trumpet 6 abandoned
TP5.9 - - - n 27 n
TP5.10 bass guitar 6 4 n 25 n